Chemistry Lab University of Washington Department of Computer Science & Engineering
 User CGI for CSE Users
  CSE Home  About Us    Search    Contact Info 

 WWW Tools and Service
 Apache Web Server Docs
 Home Page FAQ
 CSE Web FAQ
   

Executive Summary

Support for user CGI and PHP scripts is now provided on abstract.cs.washington.edu, a lab-supported Linux machine running Apache httpd 2.2. If you have a CSE account, you probably already have an account on abstract, and therefore the ability to create your own CGI scripts.

Go to:

About CGI Scripts

CGI scripts are special programs that have been designed to be run by a web server. The acronym stands for "common gateway interface," meaning that the CGI mechanism is intended to be a gateway to other types of information beyond static files.

While we call them scripts, there is no requirement that a CGI script be coded in a scripting language such as perl, shell, or python, though those are all common ways to do it. Compiled languages such as C can also be used.

What makes a CGI script different from other programs is fairly minor. Most notably, it must emit a little bit of the HTTP header information as its first action. Secondly, the server sets up a number of enviroment variables that help the script understand the context in which it's being run.

Here is a simple perl CGI program that illustrates both of these concepts:

  #!/usr/bin/perl
  print "Content-type: text/html\n\n$ENV{'HTTP_USER_AGENT'}\n";

The CGI/1.1 specifcation is here. Apache documentation for mod_cgi, the component of Apache that handles CGI scripts, is here.

Go to top.

About Your Abstract Account

abstract.cs.washington.edu is a lab-supported Linux machine that runs the Apache web server, version 2.2 (local documentation is here). It has been configured to run CGI scripts for all account holders in a mode called suEXEC, which means that CGI scripts run under your own user identity. That means that you could write a CGI script that could take any action on the computer that you could, both a useful and a scary fact.

You can mix your CGI scripts right in with your static content. By default, anything with a filename extension of .cgi, .cl, .lsp, .lisp, or .pl will be treated as a CGI script-- that is, the web server will try to run it as a CGI script instead of just passing the contents over to the browser. You can manipulate the list of such extensions-- or specify specific files to be treated as CGIs-- using Apache directives in a .htaccess file. (Details are left as an excercise for the reader.)

Lab staff has created accounts for all grads, undergrads, faculty, and staff on abstract. To login, use your kerberos credentials.

Your home directory on abstract is distinct from that on any other host, and is /www/homes/<user>/. Your web content is in /www/homes/<user>/www/. The corresponding URL is http://abstract.cs.washington.edu/~<user>/.

abstract:/www/homes/ is exported to other CSE Unix hosts as /homes/abstract/, so your home directory is available on those hosts as /homes/abstract/<user>/. (Also, on research hosts, as /cse/abstract/<user>).

Go to top.

About Security

Unlike almost all other programs on a computer, CGI scripts can be run by arbitrary users on the public internet (this is also true of special purpose network service providers such as the ftp service, when it's enabled). Some of those users would enjoy the opportunity to subvert the intention of your program to their own creative ends.

It's quite hard to write a CGI script that can't be subverted. Security alerts reporting exploits of common CGIs abound. That's why we don't enable user CGI execution on hosts like www.cs.washington.edu, which serve mission-critical department functions.

Here's how we manage security risks on abstract:

  • Typically, CGI scripts on an Apache server run under the unprivileged "apache" account, which sharply limits what actions a CGI script can take. But that means that any CGI script can maniuplate the data of any other user, so it's a poor choice for the abstract environment. Instead, we use the suEXEC mechanism so that CGI scripts run as the script owner.
  • suEXEC itself imposes some rules that are intended to enhaces security. For example, you must own the script and the containing directory, and neither may be group-writable, and the group of both must match your own primary group. See the documentation for suEXEC for the full story.
  • Because the scripts are running with the privileges of the owner, we don't export files from other servers to abstract (except, read only, /uns/). That means that only files on abstract can be directly manipulated by CGI scripts running on abstract.

Here are a few sugggestions for how to keep your CGI scripts from being turned against you:

  • Don't make it easy to move from abstract to another host without credentials. Specifically, don't mention abstract in any .rhosts file- that could enable a malevolent user that succeeded in exploiting a flaw in a CGI of yours to leverage their access to affect your data on another host. Think rsh june /bin/rm -rf ..
  • The cardinal rule in CGI scripting is that user input is tainted. In particular, if you allow unchecked user input from an HTML form to specify the name of a file that is subsequently opened, all is lost- Voldemort's thousand-year reign on earth will begin immediately. In perl, a handy construct for checking user input is
     $data =~
      s/[^A-Za-z0-9_]/_/g; 
    What that says is "replace any character in that variable (which is presumed to be the value entered into an HTML form by a remote user) that isn't alphanumeric with an underscore."
  • Perl offers "taint mode," which helps you write secure programs, particularly for the case where you are processing arbitrary user input. See perldoc perlsec for details.

The World Wide Web Security FAQ, hosted at W3, has a section on CGI scripts, written by Lincoln Stein.

Go to top.

Tips on Writing CGI Scripts

  • Debugging CGI scripts can be a real treat-- your only immediate evidence that things have gone horribly wrong is an HTTP code 500 server error. A handy fact: anything your script prints on the standard error (such as compilation or runtime diagnostics) is collected in the server error log, which, at our site, is /www/htdocs/logs/error_log. So, your first question after your server error should be "what does the error log tell me about this?" To help you answer that question, we offer this error log filter.
  • Authentication works the same way with CGI scripts as it does with HTML files- you can choose from the usual smörgåsbord of IP/hostname, CSENetID, UWNetID, and basic authentication. One handy fact: if you require a username/password to gain access to your script, the REMOTE_USER envar will be set to the username of the authenticated user. If you use CSENetID, that will be their CSE username. See Controlling Access to Your Documents for information of writing .htaccess files.
  • Many CGI coders like to use perl to program their scripts because it is nicely "impedance-matched" to the web. (Python is another excellent choice, while some people like to use shell or lisp). We offer version 5.6.1 as /usr/bin/perl, and the uns people currently offer version 5.8.0 as /uns/bin/perl. The easiest way we know to write a perl-coded CGI script is using Lincoln Stein's CGI module. Type perldoc CGI at a command prompt for details. Type perldoc perllocal to see what third-party modules we support. For persistant storage of data, may we suggest the Storable module? DB_File an another good choice. BTW, if your perl script won't compile, it won't run. Type perl -c <script> at a command prompt to compile your script without running it.
  • Remember that suEXEC imposes special constraints on your scripts. If, for example, the script-- or the directory it lives in-- is group-writable, it won't run (HTTP code 500 server error), and telling why it won't run can be challenging. The documentation for suexec has a security checklist that you should visit whenever you are "sure" that a script "should" run. No, the server isn't broken.

Go to top.

PHP

Abstract also supports PHP, a scripting language that allows you to embed statements in an HTML file. This is a concept much like (but historically preceeds) Microsoft's Active Server Pages, but without the language independence. As a language, PHP is sort of like perl.

There is documentation for PHP. You might consider starting with the FAQ.

PHP is configured at build-time to support any of a vast array of extensions. You can see how we built ours by visiting PHP Info for Abstract.

Files on Abstract that have the filename extension .php will be parsed for PHP language constructs. Files named index.php will be processed as the directory index for the containing directory if no other directory index files (such as index.html) exist in the directory.

At this writing, we run version PHP 5.2. NB: versions of PHP later than 4.2.0 have different a default setting for the "register globals" behavior than did all previous versions; that tends to break a lot of older code. Information on this issue can be found in Using Register Globals and in Working with Form Variables in PHP 4.2+. A quick workaround is to create a .htaccess file in the directory containing your code with this content:

  php_flag register_globals on
(It's quick, but not recommended. Better is to live in and code for the present. Register Globals won't be back, but if it did return, we'd expect it to be riding an eyeless horse, smelling of sulphur, wearing a black cloak, and carrying a scythe.)

(PHP used to stand for "Personal Home Page," but now they pretend it's for "Hypertext PreProcessor.")

NB: your data is not secure when managed by PHP. PHP behaves a lot like CGI, but it doesn't run under the suEXEC mechanism. Your CGI scripts run as you. Your PHP scripts run as the web server user (apache). Your colleagues' PHP scripts also run as the web server user. Data managed by your PHP scripts is writable by the web server user, trashable by the web server user, and therefore trashable by your colleagues' PHP scripts. If you chose to use PHP to write to the file system, make sure that the data is not sensitive or valuable. Keep your own backups of the data. Consider using CGI instead of PHP to manage data.

Go to top.

Server-Side-Includes

Server side includes, AKA "SSI," is a technique for adding dynamic content to HTML pages. This is done by crafting HTML comments with special syntax-- called "SSI directives"-- inline with your HTML content.

The possibilities with SSI are much more limited than with PHP, but, on the upside, you don't need to learn an entire programming language to use it. You can set, test, and echo variables, include the output of programs or CGI scripts, include external files, and do a few other little tricks.

Please read Apache Tutorial: Introduction to Server Side Includes in the Apache 1.3 documentation for details.

To make SSI work on your content, you need to do two things:

  1. Specify that SSI be enabled for your web tree. The simplest way to do this is to add the following line to your .htaccess:
      options +includes
    
  2. Specify which files are processed for SSI directives. To get all .html files processed for SSI, you could add this line to your .htaccess:
      addoutputfilter INCLUDES .html
    
    Alternatively, name the files you want parsed for SSI with a .shtml extension; the server parses all such SSI-enabled files for SSI directives.

Go to top.

About HTTPS/SSL on Abstract

Abstract offers SSL (HTTPS) service. That means that pretty much anything that's offered under http://abstract.cs.washington.edu/ is also available under https://abstract.cs.washington.edu/. To make this work, staff created an "SSL certificate" that identifies the server and contains a key for encrypting the session.

The SSL certificate on Abstract was signed by the "UW Services CA" (administered by C&C), not by a commercial certificate authority (how stingy is that?)-- they charge upwards of $150 to perform this service. What that penny-pinching move means is that most users who browse to https://abstract.cs.washington.edu/ (or anything that starts with it) will face down a mildly scary alert from their browser warning them that the certificate has a problem, and asking them if they want to accept it. Therefore, if you chose to advertise your Abstract resources at an HTTPS URL, you might warn your users that the certificate is locally-signed, or that they should expect the security dialog, or that they should install C&C's root CA certificate by browsing here.

(We offer HTTPS services on most other CSE Apache web servers, too. www.cs.washington.edu and www4.cs.washington.edu do have commercially-signed certificates because we need to support external users there. Others don't.)

Closing Notes

By now, you are thinking "This document sucks! I don't have to put up with this @#$#%!" Right you are: your comments and criticisms are welcomed.

[Last modified: Monday, October 15, 2007, at 02:44PM PDT.]

Go to top.


CSE logo Department of Computer Science & Engineering
University of Washington
Box 352350
Seattle, WA  98195-2350
(206) 543-1695 voice, (206) 543-2969 FAX
[comments to webmaster]