Executive Summary

Support for user CGI and PHP scripts is provided on homes.cs.washington.edu, a lab-supported Linux machine running Apache httpd 2.2. If you have a CSE account, you probably already have an account on homes, and therefore the ability to create your own CGI and PHP scripts.

More information about homes.cs is in the Home Page FAQ.

Go to:

About CGI Scripts

CGI scripts are special programs that have been designed to be run by a web server. The acronym stands for "common gateway interface," meaning that the CGI mechanism is intended to be a gateway to types of information beyond files with static content.

While we call them scripts, there is no requirement that a CGI script be coded in a scripting language such as perl, shell, or python, though those are all common ways to do it. Compiled languages such as C can also be used.

What makes a CGI script different from other programs is fairly minor. Most notably, it must emit a little bit of the HTTP header information as its first action that produces output. Secondly, the server sets up a number of enviroment variables that help the script understand the context in which it's being run.

Here is a simple perl CGI program that illustrates both of these concepts:

  #!/usr/bin/perl
  print "Content-type: text/html\n\n$ENV{'HTTP_USER_AGENT'}\n";

The CGI/1.1 specifcation is here. Apache documentation for mod_cgi, the component of Apache that handles CGI scripts, is here. A simple tutorial is here.

Go to top.

About Your Homes Account

homes.cs.washington.edu is a lab-supported Linux machine that runs the Apache httpd web server, version 2.2 (documentation is here). It has been configured to run CGI scripts for all account holders in a mode called suEXEC, which means that CGI scripts run under your own user identity. That means that you could write a CGI script that could take any action on the computer that you could, both a useful and a scary fact.

You can mix your CGI scripts right in with your static content. By default, anything with a filename extension of .cgi, .cl, .lsp, .lisp, .pl, .py, or .rb will be treated as a CGI script-- that is, the web server will try to run it as a CGI script instead of just passing the contents over to the browser. You can manipulate the list of such extensions-- or specify specific files to be treated as CGIs- using Apache directives in a .htaccess file. (Details are left as an exercise for the reader.) NOTE: To enable python scripts, you'll need to add the following line to your local .htaccess file:

addhandler cgi-script .py

Lab staff has created accounts for all grads, undergrads, faculty, and staff on homes.

Your home directory on homes is distinct from that on any other host, and is /www/homes/<user>/. Your web content is in /www/homes/<user>/public_html/. The corresponding URL is http://homes.cs.washington.edu/~<user>/.

homes:/www/homes/<user>/public_html/ is exported to other CSE Unix hosts as /cse/web/homes/<user>/.

Go to top.

About Security

Unlike almost all other programs on a computer, CGI scripts can be run by arbitrary users on the public internet (this is also true of special purpose network service providers such as the ftp service, when it's enabled). Some of those users would enjoy the opportunity to subvert the intention of your program to their own creative ends.

It's quite hard to write a CGI script that can't be subverted. Security alerts reporting exploits of common CGIs abound. That's why we don't enable user CGI execution on hosts like www.cs.washington.edu, which serve mission-critical department functions.

Here's how we manage security risks on homes:

  • Typically, CGI scripts on an Apache httpd server run under the unprivileged "apache" account, which, because it is a member of no groups, sharply limits what actions a CGI script can take. But that means that any CGI script can maniuplate the data of any other user, so it's a poor choice for the homes environment. Instead, we use the suEXEC mechanism so that CGI scripts run as the script owner.
  • suEXEC itself imposes some rules that are intended to enhance security. For example, you must own the script and the containing directory, and neither may be group-writable, and the group of both must match your own primary group. See the documentation for suEXEC for the full story.
  • Because the scripts are running with the privileges of the owner, we don't export files from other servers to homes (except, read only, /uns/). That means that only files on homes can be directly manipulated by CGI scripts running on homes.

Here are a few sugggestions for how to keep your CGI scripts from being turned against you:

  • Don't make it easy to move from homes to another host without credentials. Specifically, don't mention homes in any .rhosts file- that could enable a malevolent user that succeeded in exploiting a flaw in a CGI of yours to leverage their access to affect your data on another host. Think rsh barb /bin/rm -rf ..
  • The cardinal rule in CGI scripting is that user input is tainted. In particular, if you allow unchecked user input from an HTML form to specify the name of a file that is subsequently opened, all is lost- Voldemort's thousand-year reign on earth will begin immediately. In perl, a handy construct for checking user input is
     $data =~
      s/[^A-Za-z0-9_]/_/g; 
    What that says is "replace any character in that variable (which is presumed to be the value entered into an HTML form by a remote user) that isn't alphanumeric with an underscore."
  • Perl offers "taint mode," which helps you write secure programs, particularly for the case where you are processing arbitrary user input. See perldoc perlsec for details.

The World Wide Web Security FAQ, hosted at W3, has a section on CGI scripts, written by Lincoln Stein.

Go to top.

Tips on Writing CGI Scripts

  • Debugging CGI scripts can be a real treat-- your only immediate evidence that things have gone horribly wrong is an HTTP code 500 server error. A handy fact: anything your script prints on the standard error (such as compilation or runtime diagnostics) is collected in the server error log, which, at our site, is /www/htdocs/logs/error_log. So, your first question after your server error should be "what does the error log tell me about this?" To help you answer that question, we offer this error log filter.
  • Authentication works the same way with CGI scripts as it does with HTML files- you can choose from the usual smörgåsbord of IP/hostname, CSENetID, UWNetID, and basic authentication. One handy fact: if you require a username/password to gain access to your script, the REMOTE_USER envar will be set to the username of the authenticated user. If you use CSENetID, that will be their CSE username. See Controlling Access to Your Documents for information of writing .htaccess files.
  • Many CGI coders like to use perl to program their scripts because it is nicely "impedance-matched" to the web. (Python is another excellent choice, while some people like to use shell or lisp). We offer version 5.6.1 as /usr/bin/perl, and the uns people currently offer version 5.8.0 as /uns/bin/perl. The easiest way we know to write a perl-coded CGI script is using Lincoln Stein's CGI module. Type perldoc CGI at a command prompt for details. Type perldoc perllocal to see what third-party modules we support. For persistant storage of data, may we suggest the Storable module? DB_File an another good choice. BTW, if your perl script won't compile, it won't run. Type perl -c <script> at a command prompt to compile your script without running it.
  • Remember that suEXEC imposes special constraints on your scripts. If, for example, the script-- or the directory it lives in-- is group-writable, it won't run (HTTP code 500 server error), and telling why it won't run can be challenging. The documentation for suexec has a security checklist that you should visit whenever you are "sure" that a script "should" run. No, the server isn't broken.

Go to top.

PHP

Homes also supports PHP, a scripting language that allows you to embed statements in an HTML file. As a language, PHP is sort of like perl, but less elegant.

There is documentation for PHP. You might consider starting with the FAQ.

PHP is configured at build-time to support any of a vast array of extensions. You can see how we built ours by visiting PHP Info for Homes.

Files on Homes that have the filename extension .php will be parsed for PHP language constructs. Files named index.php will be processed as the directory index for the containing directory if no other directory index files (such as index.html) exist in the directory.

At this writing, we run version PHP 5.2. NB: versions of PHP later than 4.2.0 have different a default setting for the "register globals" behavior than did all previous versions; that tends to break a lot of older code. Information on this issue can be found in Using Register Globals and in Working with Form Variables in PHP 4.2+. A quick workaround is to create a .htaccess file in the directory containing your code with this content:

  php_flag register_globals on
(It's quick, but not recommended. Better is to live in and code for the present and near future. Register Globals won't be back, but if it did return, we'd expect it to be riding an eyeless horse, smelling of sulfur, wearing a black cloak, and carrying a scythe.)

(PHP used to stand for "Personal Home Page," but now they pretend it's for "Hypertext PreProcessor.")

NB: your data is not secure when managed by PHP. PHP behaves a lot like CGI, but it doesn't run under the suEXEC mechanism. Your CGI scripts run as you. Your PHP scripts run as the web server user (apache). Your colleagues' PHP scripts also run as the web server user. Data managed by your PHP scripts is writable by the web server user, trashable by the web server user, and therefore trashable by your colleagues' PHP scripts. If you chose to use PHP to write to the file system, make sure that the data is not sensitive or valuable. Keep your own backups of the data. Consider using CGI instead of PHP to manage data.

Go to top.

Server-Side-Includes

Server side includes, AKA "SSI," is a technique for adding dynamic content to HTML pages. This is done by crafting HTML comments with special syntax-- called "SSI directives"-- inline with your HTML content.

The possibilities with SSI are much more limited than with PHP, but, on the upside, you don't need to learn an entire programming language to use it. You can set, test, and echo variables, include the output of programs or CGI scripts, include external files, and do a few other little tricks.

Please read Apache Tutorial: Introduction to Server Side Includes in the Apache 2.2 documentation for details.

To make SSI work on your content, you need to do two things:

  1. Specify that SSI be enabled for your web tree. The simplest way to do this is to add the following line to your .htaccess:
      options +includes
    
  2. Specify which files are processed for SSI directives. To get all .html files processed for SSI, you could add this line to your .htaccess:
      addoutputfilter INCLUDES .html
    
    Alternatively, name the files you want parsed for SSI with a .shtml extension; the server parses all such SSI-enabled files for SSI directives.

Go to top.

About HTTPS/SSL on Homes

Homes offers SSL (HTTPS) service. That means that pretty much anything that's offered under http://homes.cs.washington.edu/ is also available under https://homes.cs.washington.edu/. To make this work, staff created an "SSL certificate" that identifies the server and contains a key for encrypting the session.

The SSL certificate on homes was signed by a commercial certificate authority.

(We offer HTTPS services on most other CSE Apache httpd web servers, too.)

Closing Notes

By now, you are thinking "This document sucks! I don't have to put up with this @#$#%!" Right you are: your comments and criticisms are welcomed.