Chemistry Lab University of Washington Computer Science & Engineering
 UW CSE About Logfiles
  CSE Home   About Us    Search    Contact Info 

Background

The web server creates a set of log files as it runs. At CSE, we currently create these logs:

/cse/www/logs/common_log (all hosts)
Information about each access, including orginating host, username when (rarely) available, time of the access, text of the request, the response status, the number of bytes transferred, the refering document, and the user agent.

/cse/www/logs/anonymized_log (www.cs.washington.edu only)
Like common log, but the hostname/IP and username (when present) are rewritten as unique integers (1:1 mapping) and the logname is rewritten as "-" to obscure the source of the requests. This log is created daily at 1:30AM from the previous day's common_log.

/cse/www/logs/usertrack_log (www.cs.washington.edu only)
Output from mod_usertrack, which uses HTTP cookies to track user browsing patterns more accurately than permitted by the information in the common_log. See this document for information on the research uses of these logs.

/cse/www/logs/error_log (all hosts)
Information about errors.

/cse/www/logs/suexec_log (all hosts)
suexec is a mechanism for running CGI scripts under some user ID other than that of the (unprivileged) web server user "nobody." We use it heavily on hosts such as Abstract and Cubist, and lightly on WWW. The suexec_log collects informational and error messages for CGI scripts run under the suexec mechanism.

logs for virtual hosts
A number of virtual hosts are hosted on the same machine that serves www.cs.washington.edu content. Each of these hosts has associated instances of common_log and error_log that differ from those for www.cs.washington.edu in that they accumulate for a full week, rotating on Saturday night at midnight. These logs are kept for one full week after they are closed. Examples are /cse/www/logs/sigmetrics/sigmetrics_log and /cse/www/logs/sigmetrics/sigmetrics_errors.

For www.cs.washington.edu, all of these logs can be read from any host that mounts /cse/www, which includes most or all of the "research" Unix hosts (but not "instructional" hosts; instructional users should see the section on split logs for information on how to get log reports). CSEResearch Windows hosts can get at the logs by mounting the share \\ntdfs\cs\cse\.

Details of the format of common_log are here (local users only).

Here is Apache documentation for mod_log_config.

The error log can be very handy for debugging CGI scripts and detecting broken links.

The anonymized_log (when provided) is provided for research and instructional purposes by users that aren't members of the CSE research community.

Logfile Archives

error_log is kept for one day after it is created. The log for the previous day can be found in /cse/www/logs/error_log.<date>. common_log and usertrack_log are kept for seven days, in /cse/www/logs/common_log.<date>. This "date" filename extension is in the format YYYYMMDD- for example, 20000202 is the exension for February 2, 2000. suexec_log is entirely removed nightly.

Split Logs

Each night near midnight, a set of split logs are generated. A split log is a file of entries from common_log that contains entries for only a single top-level document tree, and are named according to the tree they represent. For example, rose.log contains entries for accesses to /homes/rose/. Besides the trees in /homes/, split logs are also created for each tree in /research/projects/.

Split logs are stored in /cse/www/logs/split/, and are kept for one day only.

Instructional users, who don't have access to the file systems containing the logs, can subscribe to a daily log report: the split log corresponding to their content will be mailed nightly to each such user for whom a file called .splitinfo is found in their www/ directory. The contents of the file are ignored.

Server Statistics

Each night, a report summarizing the logs is generated. Server statistics are maintained here. This report is accessable to local users only.

Those statistics are generated by analog. Local documentation for analog is here.

Weekly Access Reports

Publishers of web documents on www.cs.washington.edu can request a weekly report on accesses to their document. It works like this:

  1. User creates a file called .statinfo in the root of some document tree. The file contains one or more email addresses, and must be world readable.
  2. On Monday morning at 3:00AM a script runs that collects information about accesses to all the documents for the past week and mails the information for any document tree to the addresses in the corresponding .statinfo file.

The .statinfo files are sought in all the subdirectories of the following URL trees:

which means that /homes/turing/.statinfo will be sought but e.g. /homes/turing/awards/.statinfo will not be. For Dr. Turing, that means that his .statinfo file should be created in ~turing/www/. The report gives a count of the accesses for the past week to each document, plus the names of the refering documents (when available).

So, if, for example, there were fifty accesses to http://www.cs.washington.edu/homes/rose/ in a week, forty of which were from http://www.fbi.gov/least-wanted.html, five of which were from http://www.cs.washington.edu/lab/staff.html, and five with no referer info, the report would contain text roughly like this:

  /homes/rose/index.html (50 accesses)
     <= http:/www/fbi.gov/least-wanted.html (40 accesses)
     <= http://www.cs.washington.edu/lab/staff.html (5 accesses)

To start receiving such reports, a user must create a .statinfo file in a subdirectory of the URL /homes/ or the URL /research/projects/. To turn it off, remove the .statinfo file. To get reports on some other tree not rooted in /homes/ or /research/projects/ contact the webmaster to request that the script be modified to look for .statinfo files there.


CSE logo Computer Science & Engineering
University of Washington
Box 352350
Seattle, WA  98195-2350
(206) 543-1695 voice, (206) 543-2969 FAX
[comments to webmaster]