| Guidelines for Writing HTTP Server Scripts |
This document is intended to be an evolving series of ideas, pointers and other information about writing programs that can be executed by following a WWW link, with particular emphasis on security issues.
The rest of this document assumes that you already know how to write programs. It also doesn't attempt to cover the same ground as NCSA's introduction to forms, which should be be considered the starting point for explorations of forms. You should also be sure to read the Common Gateway Interface documentation as well, which describes the interface that the HTTP server defines between HTML messages and server-side executables.
| Guidelines For Writing HTTP Server Scripts |
The potential problems with security cannot be overemphasized. Unlike existing network protocols, which generally allow either:
There are some basic features of server-side scripts that if used correctly will minimize the potential for security problems:
| The Basic Idea |
When you follow a link using a URL of the form:
http://foo.bar.baz/a/b/cthe HTTP server at foo.bar.baz will check each successively longer substring of
/a/b/c (ie. /a,
/a/b, etc.) against the list of "ScriptAliases" defined
in the server's configuration files. A ScriptAlias looks like this:
ScriptAlias /a /some/other/place/in/the/filesystem/awhich the server interprets to mean: if anyone ever references
/a/something, then execute
/some/other/place/in/the/filesystem/a and return its
output. Note that this implies two things about the executed program:
it must send a MIME Content-Type header as its first line of output,
to tell the client (Mosaic) what the output actually is (HTML ? GIF ?
JPEG ? etc), and then it should send some "useful" output, even if its
only an "OK, message received" line. See the mail-request
mentioned below for an example of how to do this.
Only programs located in places referenced by a ScriptAlias will ever be executed by the server. In addition, the server caches a directory listing of all the programs in each location referenced in a ScriptAlias whenever it is started (or restarted), and uses it to check possible server-side programs before executing them. This prevents random programs placed in the right place from being accessed without a server restart (which only a priviledged user can do).
| What about arguments? What about input? |
Once the HTTP server has discovered that /a/b is actually
an executable program in a ScriptAlias location, it executes the
program, passing it data in two ways.
First of all, any text left over from the URL that has not been "used"
to find the script will be used to set the value of an environment
variable named PATH_INFO. In the example above, this would relatively
simple: PATH_INFO would just be /c. However, near
arbitrary text can be used here:
http:/foo.bar.baz/a/b/long=4748.39?//limit:=$!!:h+aposto:*&%&^$$#{fhfh}
This will result in PATH_INFO being set to:
/long=4748.39?//limit:=$!!:h+aposto:*&%&^$$#{fhfh}
(note the initial `/'). The main restriction is that spaces are not
allowed, or rather, will terminate the component of the URL used to
set PATH_INFO.
In addition, if you are using a forms interface, the values of all the
<input> and <select> tags in the
form will be made available, as the standard input of the
program.
| Encoding |
This will be encoded to guarantee safe transmission. This encoding is an important issue, because to make reasonable use of the data sent to your program, you need to decode it.
You can do this in Perl quite easily:
sub parseQuery {
local($query_string) = @_;
local(@q, $pair);
# break up into individual name/value lines
@q = split(/&/, $query_string);
foreach $pair (@q) {
# Convert + into space
$pair =~ s/\+/ /g;
# break the name/value pairs up
($key,$value) = split(/=/, $pair, 2);
# un-urlencode the key and value
$key =~ s/%(..)/pack("H*",$1)/ge;
$value =~ s/%(..)/pack("H*",$1)/ge;
$query{$key} = "\0" if (defined ($query{$key}));
$query{$key} = $value;
}
return 1;
}
For other scripting languages (like sh,awk) a filter
called urldecode can be used by your own programs to do
the decoding. Invoke it as:
/cse/www/htbin-post/urldecodeand it will convert any encoded data read from its standard input into its original form on its standard output. At some point, I'll add a object module you can link with to do this from a compiled langauge like C (although you may get there before me, since the encoding is so simple, and the source code for urldecode is there).
More details are available about writing server scripts in the Common Gateway Interface documentation, where a number of other environment variables that are available to the program are described.
| How to do this locally |
Send mail to webmaster@cs and ask. The webmaster will
give you yet another stern lecture about all this stuff, and then will
set you up with a directory in /cse/www/htbin-post/ where any server
scripts must live.
I want to reiterate that this is potentially a big security issue. Please take care in how you handle arguments, how to handle input and what your program does or might do.
For the time being, all instances of these programs will run as the uid "nobody". Notice also that the programs run on the machine www.cs.washington.edu, which is a Pentium Pro 200 running Linux 2.0.x. If your server "script" won't run in such an environment, you're in trouble.
Also access to both areas (the private one and the 590i area) is currently limited to machines in the .cs.washington.edu domain. This restriction is inconvenient, and intended to create a temporary breathing space so that we can get more experience with potential security issues.