| Controlling Access to Your Documents |
If you are a CSE web content provider, you
may find that the default access controls for your content aren't what
you need. For example, home pages default to being served only to
hosts in .cs.washington.edu - most people override that
default to publish more widely. To override the defaults, you need to
create an access control file. This document tells you how and will
even build the file for you (but you have to copy it into place).
CSE web servers support a small variety of schemes for restricting web
access to your documents. This document
- provides some background information
- summarizes the types of restrictions that you as an information
provider may apply
- provides form-based access to some of the tools used to manage
authorization files
Go to:
[Last modified 09/30/08 at 07:04AM PDT.]
System-wide access policy is controlled by
system configuration files read by the server at startup (or when a
running server is signalled to re-read its configuration). Since it
is impractical for an administrator to edit those files to control
access to resources maintained by other content providers, there are
also mechanisms to permit you, as a web page author, to control
certain aspects of access to your documents.
The server we currently run on www.cs.washington.edu (Apache/2.2.14 (Fedora)) permits users to control read access to
their documents on a per-directory tree basis - that is, you
may specify who may read all the documents in a directory and
its descendants - or a per-file basis.
Our server is configured to allow the default system-wide read
policy to be overridden by rules in an access control file called
.htaccess. In order to allow the policy established in such
a file to apply to subdirectories, the server searches for such files
at each level in a URL.
For example, when the URL
http://www.cs.washington.edu/homes/turing/ is served, the
server looks for .htaccess files in the server root
/, in /homes/, and in /homes/turing/,
opening and processing each such file each and every time that URL is
requested. That can impact the performance of the server, so this
user-level access control is a poor choice for files that are
frequently requested.
Note that the server runs without any special privilege. So, just
as is necessary for any content that is served, the .htaccess
file must be world-readable. This is true of any other auxiliary
authorization control files.
Besides .htaccess files, there are these types of auth
control files:
- password files
- When basic auth is used, user names and
passwords provided by users are matched against those stored in a
password file. The passwords are encrypted in the
file. Conventionally, this file is called .htpasswd, a good
choice because our server will not serve files starting with
.ht.
- group files
- When you wish to allow access to a subset of users in an
authentication database, you can either list the users, or specify a
name of a group. The names of and members of the groups are listed in
a group file, conventionally called .htgroup. Group files are
most useful with CSENetID auth.
There are three types of auth that users may specify in an
access control file, and there are directives that allow you to
combine them:
- hostname-based auth
- Only users connecting from a restricted set of DNS domain names
are permitted access. For example, policy could be established
that permits only hosts from .cs.washington.edu
access.
- CSENetID auth
- Users must authenticate with their
CSE kerberos username and password. This requires an
"SSL-enabled" browser (such as any late model of Netscape or
Internet Explorer, but not Lynx) and results in a cookie that can be used to authenticate to any
resource protected by CSENetID. The cookie is valid for a limited
time (currently ten hours) or until the end of the browser
session (whichever comes first). More information is
here. This type of auth is a local extension to the web
server. (NB: www.cs.washington.edu supports CSENetID authentication
only on HTTP URLs, not HTTPS URLs.)
- basic auth
- Users are prompted by their browsers - if capable - for a
username and password. These are checked against a database of
users and encrypted passwords to control access. Typically, the
browser will cache the username and password so that accesses to
any document in the authentication realm will not require repeated
authorization during that browser session. The password is passed
over the wire unencrypted, which is why you should never use
your login password with this mechanism.
- hybrid auth
- Imagine that you have content that
- you wish to be conveniently available to CSE community members: those
browsing from within the .cs.washington.edu domain or who browse
from outside but have a CSE account
- you wish to be conveniently available to users browsing from within the
.cs.washington.edu domain that also needs to be available
to certain users outside the department
- content that you wish to be available only to certain users
and only when they are browsing from within the department
In such cases you can combine hostname-based with basic or CSENetID
authorization in either an "or" or an "and" relation to get such finer
control.
In addition, we offer support of a more powerful
variant of CSENetID known as pubcookie (AKA
"UWNetID") authentication. This type of authentication, which is
supported by UW Computing and Communications, is used only with
resources accessed via HTTPS (secure HTTP) URLs, and uses your UW
kerberos credentials instead of your CSE kerberos credentials. For
more information, see Pubcookie.
There are a variety of methods used to access web resources.
Reading a file, the typical case, uses the GET method. Sending data
to a script may use POST, GET, or (rarely) PUT
methods. Each of these access methods may be separately controlled (but the
authorization tool below generates a single control scheme for all the methods
you specify).
Auth can get pretty expensive, in part because the authentication
control files are opened and parsed for each request. Consider a page
in a directory with a simple access control file. This hypothetical
page links to ten images that live in the same directory. Typically,
only the text in the HTML file is sensitive information, but
all the files in the directory are protected, and each
results in a unique request. That means that the .htaccess
file is opened and parsed eleven times each time the page is read.
In the case of CSENetID auth, it's even more costly, because each
request results in a substantial amount of computation as the cookie
is validated.
Remember, too, that auth control applies to an entire directory
tree. If there are subdirectories, each and every request for resources
in the tree cause I/O and computation to occur.
There are three main approaches to cheapening the use of auth:
- Keep all non-sensitive content outside of directories with
auth. For example, put all your images in a distinct directory tree from
your secured documents. That limits the number of times the auth control
files must be read and processed.
- Use the
<Files> and
<FilesMatch> directives to limit the application
of auth to truly sensitive files. That limits computation.
- For frequently-requested files, ask the web server administrator to
add auth directives to the server configuration files. Those files are read
once at server startup instead of for each request.
This mechanism uses a list of hostname specifications to be denied
access and a list of hostname specifications to be allowed access to
decide whether to permit access to a particular host. There are two
flavors to this: "first list those to be denied access, then
explicitly allow access to others," and "first list those to be
allowed access, then those to be denied access. These two approaches
use the following incantations, respectively:
order deny,allow
order allow,deny
So, for example, to allow all hosts from the cs.washington.edu
domain access to index.html, but no others, one specifies
<Files index.html>
order deny,allow
deny from all
allow from cs.washington.edu
</Files>
To specifically exclude hosts from hacker.org from accessing
.html files, specify
<FilesMatch "\.html$">
order allow,deny
allow from all
deny from hacker.org
</FilesMatch>
N.B.: A very common error is to put a space after the comma
in the order directive. That simple error will generate a
server error and have the effect of denying access to all comers.
With our server, this hostname/IP-based access control is
implemented by a server module called mod_access. Here is mod_access
documentation.
Hostname Authorization Tool
This form will allow the creation of two commonly-needed types of
hostname-based authorization files:
- to allow all hosts access
- to allow some hosts access (up to four DNS domains)
Fill in the form below, press Submit, then copy the temporary
file we create for you in /cse/www/tmp/ to a file called
.htaccess in the directory you wish to protect.
Password-based auth comes in these flavors here at CSE:
- basic auth
- Basic auth depends upon one or two files (besides .htaccess) to
specify which users to authorize:
- a group file that specifies which usernames are accepted
- a password file that specifies the associated passwords
In the (very common) case where a single username is sufficient, or
when you want to allow access to any user listed in the
password file, only the latter (password) file is necessary. These
files are conventionally called .htgroup and
.htpasswd, respectively. The group file is generated with a
text editor, while the password file is generated with a tool called
htpasswd.
It is considered a security hazard to place these files in any
directory published by the web server, because they could be analyzed
by a malicious user. For example, an attempt could be made to break a
password by searching a dictionary for words that encrypt to the one
of the values in the password file. Nevertheless, this is how we
typically manage the files at CSE, because it simplifies
administration. Therefore, in the event that web resources to be
controlled by basic auth are particularly sensitive, please contact the webmaster to arrange
a more secure strategy, such as storing your authentication files in a
directory not exported by the web server.
- digest
- Digest auth is similar to basic auth-- there is a file of
encrypted passwords on the server that is consulted to authenticate
users, and the content provider provides a list of usernames from that
file (or groups of usernames from that file) that are authorized to
access the affected content. But unlike basic auth, the password does
not cross the wire, and the passwords on the server are strongly
encrypted-- two key advantages. Digest auth uses the MD5 one-way hash
algorithm on both the server and client side, while basic auth uses
the deprecated crypt algorithm on the server side and effectively no
encryption algorithm on the client side. On the downside, some very
old web browsers don't support digest auth (Netscape 4 is an
example). Use the htdigest program to generate the password
file. See
Digest authentication in the apache documentation for
details.
- CSENetID
- With the CSENetID scheme, users authenticate to a "web login"
service on an SSL-enabled "secure" server, using the kerberos login
names and passwords for their CSE accounts. There is therefore no need for a
.htpassword file, though you still need to specify which
users are to be granted access- either your own list of users, or the
name of a predefined group such as fac_cs, grad_cs,
or one of a few others for which the web server administrator has
created an entry in a system-wide web groups file
(/www/auth/group). You have, then, these choices:
- If you grant access to all authenticated CSE users,
specify the valid-user pseudo-user in the
.htaccess file.
- If you grant access to a single user, only that username is
needed. That will be stored in the .htaccess file.
- If you grant access to a group of users that you wish to
define, you must specify a group name (stored in the
.htaccess file) and a list of user names (stored in a
.htgroup file.
- If you grant access to a group of users that the system defines,
you must specify the name of the group, which is stored in the
.htaccess file. To see what system groups are defined and
which ones you or any other CSE user are a member of, click Web
Group Viewer.
[Click here to see examples of CSENetID.]
[Click here to hide the examples of CSENetID.]
Example 1: In this example, anybody that has a CSENetID is
authorized:
authtype csenetid
authname "anything at all"
require valid-user
Example 2: In this example, only users rose and
sank are authorized:
authtype csenetid
authname "anything at all"
require user rose sank
Example 2: In this example, only members of a group are
authorized:
authtype csenetid
authname "anything at all"
require group cse666
csenetidauthgroupfile cse666_members
The membership of the group named cse666_members are
defined in the file called cse666, which, in this case, is in
the same directory as the .htaccess file. Here's what that file
might contain:
cse666_members: rose mitnick burr brown
(The main reason to use a group file instead of listing the users
directly in the .htaccess file is so you can reuse that list
of users elsewhere without having to edit all the files whenever the
membership changes.)
- UWNetID
- UWNetID-- developed at UW by C&C, and now an open source
project called "PubCookie"--is quite similar to CSENetID. Key
functional differences include:
- user credentials (usernames and passwords) are based upon UW
accounts instead of CSE accounts
- UWNetID-protected resources must be accessed via SSL (that is,
via the HTTPS protocol)
- The authtype is "UWNetID"
Most CSE web servers support UWNetID.
[Click here to see examples of UWNetID.]
[Click here to hide the examples of UWNetID.]
Example 1: anybody with UW credentials is authorized.
authtype uwnetid
authname "anything you like"
require valid-user
As you can see, this is precisely the same content as used for CSENetID,
save the token following authtype.
Username/Password Authorization Tool
This tool creates auth control files for the following commonly-needed
types of auth:
- Basic auth (with one or more username/password pairs you specify)
- CSENetID auth (with one or more usernames you specify)
- CSENetID auth with a system-defined group
- CSENetID auth that allows all CSE users access
To use this tool to generate auth files for password-based auth,
fill in the form below and press Submit. We'll create
temporary files for you to copy to your content directory.
To combine hostname-based auth with password-based auth - in other
words, to use both the Require and Allow directives
- our web server software provides the Satisfy
directive. Satisfy any means to permit access to those users
that satisfy any of the authorization requirements associated with the
content - for example, either browsing from a local host or
supplying the requested username/password - while Satisfy all
requires that all the authorization requirements be met - for example,
both browsing from a local host and supplying the requested
username/password.
Below are examples of two .htaccess files that use the
satisfy any construct to control access to the URL
http://www.cs.washington.edu/doggy/. In these example, users
browsing from outside the .cs.washington.edu domain will be
prompted for credentials, while those browsing from a local host will
be allowed access without being shaken down for proof of their
identities. (We don't have a tool that supports hybrid auth yet.)
| CSENetID Auth |
Basic Auth |
order deny,allow
deny from all
Allow from .cs.washington.edu
AuthName "Lamer"
AuthType CSENetID
Require valid-user
Satisfy any
|
order deny,allow
deny from all
Allow from .cs.washington.edu
AuthUserFile /cse/www/doggy/.htpasswd
AuthName "Lamer"
AuthType Basic
Require user measles
Satisfy any
|
Specifying all instead of any in these examples
would mean that users must both browse from a local host
and supply the credentials.
A word about the AuthUserFile
directive: if the argument path starts with a /, it needs to
be a full path to the file; otherwise, it's relative to the "server
root." The server root for www.cs.washington.edu is /cse/www,
so an equivalent to /cse/www/doggy/.htpasswd would be
doggy/.htpasswd. The web server also understands all the
locally-supported "canonical paths," such as /homes/june/,
/homes/gws/, /homes/iws/, /cse/, and
/projects/. The basic rule is that the web server needs to be
able to find and read the file or web access to your resource will be
denied to all users.
- How effective is "security through obscurity" at protecting my
web resources? That is, if it isn't linked, can others still find
it?
- There are two ways that people might find unlinked
resources. Firstly, local research users can often see your web
files in the file system. Secondly, your URLs will appear in web
logs-- certainly locally, and perhaps remotely. Our web logs are
exported to all CSE research hosts as /cse/www/logs/. And,
we publish a nightly logfile analysis (http://www.cs.washington.edu/usage/) that is
accessible to all users with a CSE account; the most popular 200
URLs for the past week are listed in that report (here). Also, if your document links to remote
resources, the URLs of your documents are likely to appear in the
HTTP_REFERER fields of web logfile entries at the sites
where the linked documents are hosted.
- My users don't all have CSE nor UW accounts, so I'm forced to use
basic auth. How can I restrict access to allow only HTTPS?
- Use the SSLRequireSSL directive in your .htaccess
file. Users who try to access your resource via HTTP will then get an
error. Or consider using digest auth.
- I have a resource to share with both CSE and non-CSE users. How can
I do that?
- You can't mix password-based authentication mechanisms such as
basic auth, CSENetID, and UWNetID to access the same URL. So your
choices are (1) using basic or digest auth, (2) restricting access
to certain named hosts, (3) making the same resource appear at
distinct URLs with distinct authentication policies for each. Below,
I explain one way to implement option (3). (N.B.: the audience for
this answer is technical users only.)
Note these relevant facts:
- CSE web servers are configured to follow symbolic links
(AKA "symlinks").
- Authentication directives apply to the contents of a directory and all
subdirectories (unless you use the <Files> directive to limit
the scope thereof).
Consider as an example the following directory tree:
~user/www/
cseonly/
.htaccess1
content/
basic/
.htaccess2
.htpasswd
content@
We wish to restrict access to the URL
http://www.cs.washington.edu/homes/user/cseonly/content/ to
any user with CSENetID credentials. The contents of
.htaccess1 would be:
authtype csenetid
require valid-user
Note that ~user/www/basic/content is a symlink to
~user/www/cseonly/content/.
We wish to restrict access to the
URL http://www.cs.washington.edu/homes/user/basic/content/ to
those who know the username/password "basicly"/"ylcisab". The contents of
.htaccess2 would be:
authtype basic
authname "Basic Auth Required"
authuserfile /homes/june/user/www/basic/.htpasswd
require user basicly
(Of course, you would need to create
/homes/june/user/basic/.htpasswd with an entry for user
basicly first.) Because the URLs are distinct, but the content
is the same, the goal of mixing password-based authentication mechanisms
is met.
- authentication
- authentication refers to the process of establishing the
identity of users.
- authorization
- authorization refers to the process of controlling access to
resources.
- authentication realm
- authentication realm is a symbolic name you may give to the
resource being protected. This helps the user decide what
username/password to provide when basic auth is
used.
- cookie
- An HTTP cookie is a small chunk of data that's generated by a
web server and passed back and forth between the server and browser on
each request within a document tree. It's primarily a way to maintain
"state."
- credentials
- Proof of identity. For example, in the case of CSENetID auth, login name and password.
- SSL
- SSL is the secure socket layer, the technology used
to encrypt the conversation between a web browser and a web server.
Comments on this file to webmaint.