19
Fri, Apr
5 New Articles

Mining Your HTTP Server Logs for Statistical Gold

Typography
  • Smaller Small Medium Big Bigger
  • Default Helvetica Segoe Georgia Times

Congratulations! You finally have that Web site up and running and are now getting hundreds, if not thousands, of visitors each day, but is your Web site meeting your business objectives? Do you know who is visiting your site and what they’re looking at or, perhaps more importantly, not looking at? Log files created as byproducts of running the HTTP Server for the AS/400 can answer these questions and more. Let me show you how.

Who, Where, What, Etc.

Log files generated by the HTTP Server can tell you who is visiting your site, where they are coming from, what they are looking at, and much more. Server statistics log access to every Web page, image, or Common Gateway Interface (CGI) program that resides on your site, and you can study access patterns that users take through your site. The logs can tell you what pages people enter your site on, where they came from (e.g., search engines or links from other Web sites), and what page they were on before they left your site. Add to that all the technical statistics, and you have access to tools that can really help your site be the best it can be.

The HTTP Server Extended Log File

With V4R3, IBM introduced the National Center for Supercomputing Applications (NCSA)-compliant “extended” log format. This industry-standard log file format contains data you need to analyze your Web site to discover what is working and what is not. This log file format standard is maintained by the World Wide Web Consortium (www.w3. org/TR/WD-logfile).

Figure 1 shows a single entry from a typical HTTP Server log file. The HTTP Server creates a new log file each day; this is consistent with industry-standard practice found on Apache and other major servers. The 10 fields shown in this report are industry- standard fields captured by all Web servers (on any platform) and can be interpreted and analyzed to present all the previously mentioned types of statistics.

When considering logging and log analysis, you want to decide where to put your logs. The HTTP Server supports storing logs in either a QSYS.LIB DDS database file or a QDLS shared folder or the AS/400 Integrated File System (AS/400 IFS) “root” directory system. I strongly suggest using the AS/400 IFS root directory system. I create a root-level directory called something like WEBSITES to be the common area for all my Web servers


and their resources, and, under WEBSITES, I create a subdirectory for each Web server instance that I run on my machine. Say I call one of the server instances PUBLIC. I would create a PUBLIC subdirectory under WEBSITES and then create a LOGS sub-subdirectory to store all my HTTP Server logs for this server instance. Therefore, my logs would be stored in WEBSITESPUBLICLOGS.

I strongly advise acquiring a commercial log analysis tool to analyze your log files, because using the log by itself limits the business intelligence that you can obtain from it. Log analyzer vendors earn their money by understanding the relationships between records in your log file. If you reviewed the output of a commercial log analyzer, you would be amazed to see the amount of information that it can obtain from these 10-field records. Good log analyzers, however, do more than provide basic services, such as resolving domain names of IP addresses listed in your log. Good log analyzers perform a custom NSLOOKUP (an Internet utility that retrieves information from a Domain Name System [DNS] server about an IP address or a domain name) to obtain additional information about users from the InterNIC. (The InterNIC is the organization that operates and maintains the root name servers storing domain name registration information.) Some log analyzers are free; others vary in price. Some of the very best log analyzers cost less than $300.

If you insist on storing your logs in DDS files, follow the steps under logging in the HTTP Server for AS/400 Webmaster’s Guide V4R3 or the HTTP Server for AS/400 Webmaster’s Guide V4R4. Once you have configured the server and begun logging, you can use any standard programming language or query facility to produce your own reports. Be warned, however, that the value of any report produced with a query facility on the AS/400 is extremely limited.

Reporting Tools

In addition to analyzing the extended log file, IBM offers two additional facilities for monitoring activity on your HTTP Server interactively: the Monitor and Basic Web reporting features. The Monitor facility provides snapshots of basic HTTP Server statistics, whereas the Basic log reporting facility provides an interactive view of the access and error logs. There is a third option called Web Mining, but it is not supported if you choose the extended log file format. The extended log format is the log file format that conforms to the aforementioned standards published by the W3C. Together, the extended log file format and a good log analyzer produce much more information than IBM’s Web Mining report anyway. (I do not cover the built-in Web Mining report here.)

Configuring the HTTP Server for Logging

Be sure that your HTTP ADMIN server is running. To check, open Operations Navigator, select Network, Servers, TCP/IP, then look at the list of servers in the right-hand window. If you see HTTP Administration with the “started” status, it is running. If it shows
“stopped,” right-click on the name of the server, and then click on Start. Once the ADMIN server is running, you can access it from a browser by typing www.mycompany.com:2001 in the Location field. To modify the configuration of HTTP servers, you need *IOSYSCONFIG special authority in your AS/400 user profile. After your browser connects to your AS/400, select the IBM HTTP Server for the AS/400 from the list of services displayed and then select Configurations and Administration from the next page. Click Configurations and select a configuration from the drop-down list.

Select Basic, but be sure that the check box labeled Look up host name of requesting clients is not checked. Checking this box causes the server to do a DNS lookup on every user accessing your site and write the user’s domain name to the log file. The cost (in time) of doing DNS lookups is not worth the little bit of data you get by enabling this feature; a good log analyzer does the lookups when you run it and obtains a great deal more information.


To update your configuration, click the Apply button. Next, you need to click the Logging item on this menu, followed by Global Log File Settings. Under Global Log File Settings, you must choose how the server will log date and time information and what type of log you wish to create. I strongly suggest using local time, since the bulk of your log file reports will go to business users throughout your organization. The statistics are also more meaningful when presented in local time. If you choose local time, be sure that the QUTCOFFSET system value contains a valid offset in hours for your time zone. For example, my machines are in Los Angeles, which is -8 hours away for standard time and- 9 hours away for daylight saving time. Unfortunately, this value must be manually set whenever you change from standard time to daylight saving time or vice versa. The HTTP Server uses QUTCOFFSET and QTIME to calculate appropriate times for log entries.

Click the Access Log File item in the menu frame and type the fully qualified IFS ROOT file path to the location where you wish to store your log files. I suggest setting the log size to zero to turn off size checking and allow the file to grow as large as needed. If you specify a size and the log file reaches that size during a day’s processing, the server stops logging and you lose data. Use the default format unless you are using virtual named servers and need to create a custom log format that contains the name of the Web server.

The log file maintenance section of the form allows the HTTP Server to delete old log files on the schedule that you specify. If you click Keep logs, the server creates a new log file each day and leaves old files in the access log subdirectory until you do something manually to remove them. I use a 45-day time limit to clean up these files. This allows time for running monthly statistics and automatically keeps my system free of large, disk- consuming files.

You can also choose to exclude server IP addresses or host names, user agents (browser types), methods, Multipurpose Internet Mail Extension (MIME) types, or return codes, but I do not use this Excluded URLs setting. If you enter a URL in the box below this option, the server does not log any access from the specified URL. I prefer to log everything in the log file and use my log analyzer to subset the data.

Once you make your choices and click the Apply button to update your configuration, you have completed all the basic configuration steps necessary to create an extended log file and start logging. However, you must stop and start your HTTP Server instance before these changes become effective.

Configuring the HTTP Server Monitor Function

Figure 2 (page 75) illustrates a typical Monitor report. This view displays basic server activity statistics as of the moment you click the Monitor button on the Server Instances/Work with Server Instances display for the selected server instance. You can also display the number of total bytes transmitted and received and display a list of URLs processed since the server was last started.

Click the System Management link on the menu frame and then click Activity Monitoring. Click the Enable activity monitoring support check box and make sure that a check mark appears in the box. Finally, click the Apply button. The next time you start your server, the Monitor display will be available.

To view the Monitor report, click Server Instances at the top of the menu frame. Click Work with server instances, select the server you wish to monitor in the list box, and then click the Monitor button. (Be sure that you have applied all current V4R4 5769-DG1 [HTTP Server] PTFs. Enhancements as well as fixes are delivered via PTFs. It is important that you keep your PTFs current.)

Tools of the Trade

After configuring your server, you should collect valuable raw data containing pure gold. The trick now is to extract information from this raw data. There are many commercial Web analyzers available at prices ranging from nothing to several thousands of dollars.


Unfortunately, I have not found any that actually run on the AS/400. For now, you probably want to focus on commercial products that run on a Microsoft Windows-based PC.

A commercial log analyzer is the result of analysis and design by companies focused on understanding Web logs and log data. The data in a single log record is interesting but does not contain the value that you get from analyzing relationships between many records, building summaries, and extracting statistics that don’t seem to exist in the basic data.

Figure 1 provides a sample log record. The visitor’s IP address is 63.254.26.147, and, if you had DNS name resolution turned on, you would see HRBGA010-
1163.splitrock.net in the domain name field of the log. What does this tell you? Well, if you used a WHOIS lookup, you would find out that this is most likely an ISP named Splitrock Services located in The Woodlands, Texas. Could you write code to do this yourself? Of course you could, but the AS/400 does not have a WHOIS utility. You would have to write your own TCP/IP Sockets-based code to do it.

A list of both free and commercial log analyzer products that run on various platforms (except the AS/400) can be found on the Access Log Analyzers Web Site (www.uu.se/software/analyzers/ access-analyzers.html). My personal favorite is the WebTrends Log Analyzer from WebTrends Corporation This product runs on any Microsoft Windows 9x/NT/2000 PC and can access your AS/400 log files via NetServer or Client Access/400 (V3R2M0). If you are running V4R4, you want to configure your server to produce the EXTENDED log file; releases prior to V4R4 produce the COMMON log files and do not include reports on referrers, browsers, client operating systems, or search engines.

The WebTrends Log Analyzer reads your log files directly from the AS/400 IFS directories on your AS/400. You will need Client Access/400 (V4R3 or earlier) or configure NetServer (V4R3 or later) to allow your PC access to files in the IFS. You can FTP the files to your PC, but you will create an administrative nightmare and limit the product’s automation and scheduling capabilities. WebTrends contains a robust set of file selection logic and date-range selection logic for selecting the range of data to be included in a report. My favorite setting is last full month. With this setting, the product figures out which log files to use and contains all the calendar logic for extracting data for the previous month.

There are many options within the product to include or exclude domains, IP addresses, and much more. You can either select which of the options’ standard reports to produce or customize their reports to fit your specific needs. (I run the full set of default reports each month.) You can also choose from many output options, ranging from hard copy reports to HTML pages published to a designated directory on your Web server (my favorite). You can even email reports to a distribution list of individuals.

A Sample of Available Statistics

WebTrends produces a robust series of charts and statistics in an attractive HTML format that you can distribute to interested parties in your organization by publishing to your intranet or extranet. Figure 3 (page 75) is an example of some of these statistics. There are, of course, many more types of information and data you can retrieve, such as general statistics, URLs most visited, top demographic information, and activity levels by visitor.

The general statistics bar chart in Figure 3 provides a quick view of the number of visitors (not page hits) per day and breaks it into United States, International, and Unknown users. WebTrends identifies visitors by resolving IP addresses via NSLOOKUP and then counting unique IP addresses for the day. If it cannot resolve a visitor’s country of origin by using one of several techniques, that visitor is counted as unknown. The presentation of this chart is both a good design and a marketing technique that catches your users’ attention with a great visual and provides a top-down, drilldown approach to


analyzing your site. You see a tremendous amount of information at a glance, and your brain processes information about the chart, such as which period of the month was busiest. You may be surprised at the volume of international traffic, and perhaps management will want you to investigate further to discover an untapped market.

Go Wherever the Gold Is

WebTrends is, by far, the most popular commercial product. There are other log analyzers out there. For now, though, WebTrends is the best that I have found. If you find something as good or better, drop me a line.

REFERENCES AND RELATED MATERIALS

• Access Log Analyzers: www.uu.se/software/analyzers/access-analyzers.html
• HTTP Server for AS/400 Webmaster’s Guide V4R3 (GC41-5434-03, CD-ROM QB3AEO03)

• HTTP Server for AS/400 Webmaster’s Guide V4R4 (GC41-5434-04, CD-ROM QB3AEO04)

• National Center for Supercomputing Applications Web site: www.ncsa.uiuc.edu
• WebTrends Web site: www.webtrends.com

Typical Access Log Entry -- (wrapped for publication -- normally stored on one line)

63.254.26.147 - - [22/Apr/2000:00:04:38 -0100] "GET /html/as400_other.htm HTTP/1.1" 200 5932
"http://www.ignite400.org/html/as400resource.htm" "Mozilla/4.0 (compatible; MSIE 5."

IP Address of visitor 63.254.26.147
Visitor domain* User Id** Access Date and Time (local) [22/Apr/2000:00:04:38 -0100]
HTTP Server Method "GET
URL requested by visitor /html/as400_other.htm
HTTP protocol version HTTP/1.1"
Server return code and sub-code 200 5932
Referer
"http://www.ignite400.org/html/as400resource.htm"

Browser "Mozilla/4.0 (compatible; MSIE 5."

* only shows up if the server is configured to resolve domain names (NOT RECOMMENDED)
** the user id shows up if the user is accessing an authenticated directory

The hypen "-" is a place holder indicating that no data for the field is available.

Figure 1: Typical log file entries like this one show a lot of information but can’t go as deep as a log analyzer can.

Mining_Your_HTTP_Server_Logs_for_Statistical_Gold05-00.png 397x215

Figure 2: A typical Monitor report displays basic server activity statistics.


Mining_Your_HTTP_Server_Logs_for_Statistical_Gold06-00.png 397x297

Figure 3: The general statistics report of the WebTrends Log Analyzer is a good design and handy marketing tool.


Bob Cancilla

Bob Cancilla is the IBM Rational System i Software evangelist helping to set strategy and adoption of IBM Rational application development and life cycle management software for System i customers. Bob joined IBM after over 30 years as an IT executive in the insurance industry. He was the founder of the System i eBusiness electronic user group www.ignite400.org, is the author of four books, and is an industry leader in the areas of application architecture, methodology, and large-scale integrated systems development.

 

MC Press books written by Bob Cancilla available now on the MC Press Bookstore.

 

Getting Down to e-business with AS/400 Getting Down to e-business with AS/400

Explains the major issues, concepts, and technologies necessary to implement an AS/400-based e-business solution—from planning for e-business to selecting an ISP.

List Price $89.00
Now On Sale
 
BLOG COMMENTS POWERED BY DISQUS

LATEST COMMENTS

Support MC Press Online

$0.00 Raised:
$

Book Reviews

Resource Center

  • SB Profound WC 5536 Have you been wondering about Node.js? Our free Node.js Webinar Series takes you from total beginner to creating a fully-functional IBM i Node.js business application. You can find Part 1 here. In Part 2 of our free Node.js Webinar Series, Brian May teaches you the different tooling options available for writing code, debugging, and using Git for version control. Brian will briefly discuss the different tools available, and demonstrate his preferred setup for Node development on IBM i or any platform. Attend this webinar to learn:

  • SB Profound WP 5539More than ever, there is a demand for IT to deliver innovation. Your IBM i has been an essential part of your business operations for years. However, your organization may struggle to maintain the current system and implement new projects. The thousands of customers we've worked with and surveyed state that expectations regarding the digital footprint and vision of the company are not aligned with the current IT environment.

  • SB HelpSystems ROBOT Generic IBM announced the E1080 servers using the latest Power10 processor in September 2021. The most powerful processor from IBM to date, Power10 is designed to handle the demands of doing business in today’s high-tech atmosphere, including running cloud applications, supporting big data, and managing AI workloads. But what does Power10 mean for your data center? In this recorded webinar, IBMers Dan Sundt and Dylan Boday join IBM Power Champion Tom Huntington for a discussion on why Power10 technology is the right strategic investment if you run IBM i, AIX, or Linux. In this action-packed hour, Tom will share trends from the IBM i and AIX user communities while Dan and Dylan dive into the tech specs for key hardware, including:

  • Magic MarkTRY the one package that solves all your document design and printing challenges on all your platforms. Produce bar code labels, electronic forms, ad hoc reports, and RFID tags – without programming! MarkMagic is the only document design and print solution that combines report writing, WYSIWYG label and forms design, and conditional printing in one integrated product. Make sure your data survives when catastrophe hits. Request your trial now!  Request Now.

  • SB HelpSystems ROBOT GenericForms of ransomware has been around for over 30 years, and with more and more organizations suffering attacks each year, it continues to endure. What has made ransomware such a durable threat and what is the best way to combat it? In order to prevent ransomware, organizations must first understand how it works.

  • SB HelpSystems ROBOT GenericIT security is a top priority for businesses around the world, but most IBM i pros don’t know where to begin—and most cybersecurity experts don’t know IBM i. In this session, Robin Tatam explores the business impact of lax IBM i security, the top vulnerabilities putting IBM i at risk, and the steps you can take to protect your organization. If you’re looking to avoid unexpected downtime or corrupted data, you don’t want to miss this session.

  • SB HelpSystems ROBOT GenericCan you trust all of your users all of the time? A typical end user receives 16 malicious emails each month, but only 17 percent of these phishing campaigns are reported to IT. Once an attack is underway, most organizations won’t discover the breach until six months later. A staggering amount of damage can occur in that time. Despite these risks, 93 percent of organizations are leaving their IBM i systems vulnerable to cybercrime. In this on-demand webinar, IBM i security experts Robin Tatam and Sandi Moore will reveal:

  • FORTRA Disaster protection is vital to every business. Yet, it often consists of patched together procedures that are prone to error. From automatic backups to data encryption to media management, Robot automates the routine (yet often complex) tasks of iSeries backup and recovery, saving you time and money and making the process safer and more reliable. Automate your backups with the Robot Backup and Recovery Solution. Key features include:

  • FORTRAManaging messages on your IBM i can be more than a full-time job if you have to do it manually. Messages need a response and resources must be monitored—often over multiple systems and across platforms. How can you be sure you won’t miss important system events? Automate your message center with the Robot Message Management Solution. Key features include:

  • FORTRAThe thought of printing, distributing, and storing iSeries reports manually may reduce you to tears. Paper and labor costs associated with report generation can spiral out of control. Mountains of paper threaten to swamp your files. Robot automates report bursting, distribution, bundling, and archiving, and offers secure, selective online report viewing. Manage your reports with the Robot Report Management Solution. Key features include:

  • FORTRAFor over 30 years, Robot has been a leader in systems management for IBM i. With batch job creation and scheduling at its core, the Robot Job Scheduling Solution reduces the opportunity for human error and helps you maintain service levels, automating even the biggest, most complex runbooks. Manage your job schedule with the Robot Job Scheduling Solution. Key features include:

  • LANSA Business users want new applications now. Market and regulatory pressures require faster application updates and delivery into production. Your IBM i developers may be approaching retirement, and you see no sure way to fill their positions with experienced developers. In addition, you may be caught between maintaining your existing applications and the uncertainty of moving to something new.

  • LANSAWhen it comes to creating your business applications, there are hundreds of coding platforms and programming languages to choose from. These options range from very complex traditional programming languages to Low-Code platforms where sometimes no traditional coding experience is needed. Download our whitepaper, The Power of Writing Code in a Low-Code Solution, and:

  • LANSASupply Chain is becoming increasingly complex and unpredictable. From raw materials for manufacturing to food supply chains, the journey from source to production to delivery to consumers is marred with inefficiencies, manual processes, shortages, recalls, counterfeits, and scandals. In this webinar, we discuss how:

  • The MC Resource Centers bring you the widest selection of white papers, trial software, and on-demand webcasts for you to choose from. >> Review the list of White Papers, Trial Software or On-Demand Webcast at the MC Press Resource Center. >> Add the items to yru Cart and complet he checkout process and submit

  • Profound Logic Have you been wondering about Node.js? Our free Node.js Webinar Series takes you from total beginner to creating a fully-functional IBM i Node.js business application.

  • SB Profound WC 5536Join us for this hour-long webcast that will explore:

  • Fortra IT managers hoping to find new IBM i talent are discovering that the pool of experienced RPG programmers and operators or administrators with intimate knowledge of the operating system and the applications that run on it is small. This begs the question: How will you manage the platform that supports such a big part of your business? This guide offers strategies and software suggestions to help you plan IT staffing and resources and smooth the transition after your AS/400 talent retires. Read on to learn: