If so, then some of these instructions may not apply to you. Make sure you have read the instructions on installing software for working from home, and in particular that you understand how to adapt these instructions to a personal PostgreSQL installation.
As you know, we will be using Microsoft SQL Server on IISQLSRV in this class. However, to reduce the load on IISQLSRV, and to gain experience with multiple database systems, we will be also be using the open-source PostgreSQL database system, running on the Windows Lab machines with data stored in your private files.
This tutorial will tell you most of what you need to know to get started with PostgreSQL for CSE 444. If you would like to learn more about PostgreSQL, the official documentation includes a more detailed tutorial on using PostgreSQL. (We recommend skipping the tutorial's material on SQL, especially joins; refer to your book and the lecture notes instead.)
Unlike SQL Server, PostgreSQL is not usually used through a graphical interface. Rather, the server and client tools are typically accessed from a command line. To get a command shell that is set up to run these tools on the Lab machines, download and run the PostgreSQL shell launcher script. You should only run this script from a Windows Lab machine (or some computer set up the same way).
To test that the script has set up the tools paths correctly, type psql -?
at the prompt,
and hit Enter. You should see a usage message similar to the following:
This is psql 8.4, the PostgreSQL interactive terminal. Usage: psql [OPTIONS]... [DBNAME [USERNAME]] [[Continues...]]
Both SQL Server and PostgreSQL store databases in ordinary files within the filesystem. On IISQLSRV, all the data files are located in a single folder which you cannot access. When you use PostgreSQL, however, you will create a data folder just for yourself. To create the data files, do the following:
C:
or D:
drive, for fast access, or somewhere that is
accessible to all Windows Lab machines, such as your Windows or Unix home folders, to avoid
needing to back up the data folder whenever you log out from the machine.
For the rest of this example I will assume you are using the
Z:\pgsql_data
folder, which is just inside your Windows home folder (Z:\
).initdb
command to create the data files:
> initdb -D Z:\pgsql_data
As the database files are prepared, you will see log messages similar to the following:
The files belonging to this database system will be owned by user "michaelr". This user must also own the server process. [[Continues...]] Success.
To start PostgreSQL, run the pg_ctl start
command by choosing
Start->Run, giving the name of the data folder you created earlier:
> "C:\Program Files\PostgreSQL\8.4\bin\pg_ctl.exe" start -D Z:\pgsql_data
This will pop up a shell window that displays server starting
among other messages. When done, you can stop PostgreSQL either by hitting
Ctrl-C in the shell window, or by using the pg_ctl stop
command:
> "C:\Program Files\PostgreSQL\8.4\bin\pg_ctl.exe" stop -D Z:\pgsql_data
Note the following:
pg_ctl
directly from a PostgreSQL shell prompt; however, this is no longer recommended as we have found it too confusing to have the output from PostgreSQL mixed with the interactive shell program, and it's too easy to quit PostgreSQL by typing Ctrl-C to another program.-l
(lowercase L) option to pg_ctl start
:
> "C:\Program Files\PostgreSQL\8.4\bin\pg_ctl.exe" start -D Z:\pgsql_data -l Z:\pgsqllog.txt
Just as in SQL Server, your tables must be placed in a database, which you must create.
To create a database named my_database
, do the following:
createdb
command to create the database:
> createdb my_database
Note that, unlike SQL Server, PostgreSQL is case-sensitive when looking up database and table names. However, PostgreSQL automatically lowercases all names given in SQL code, so for you, the case-sensitivity should only affect non-SQL code that needs to know the database name, including:
createdb
commandpsql
SQL query tool, discussed belowIn such code, always spell the database name exactly as you created it, respecting case.
To run SQL queries on SQL Server, you use SQL Server Management Studio (aka "SSMS" or "SqlWb").
In PostgreSQL, you use the
psql
utility. This command-line program has fewer features
of SSMS; however, psql
is easy to use, provided you don't have so much data that a plain-text display
of that data becomes unreadable. If you prefer a GUI interface for running queries, try pgAdmin III, located in
the Start->Programs menu under "PostgreSQL 8.4" on the Windows Lab machines, or check out
the PostgreSQL wiki's list of GUI query tools.
Once PostgreSQL is running, you can run psql by opening a PostgreSQL shell and typing
> psql my_database
where my_database
is the name of the database you want to use. (If you omit
the database name, psql defaults to accessing the database with the same name as your CSE username.)
When psql opens, it will show a message like the following:
Welcome to psql 8.4, the PostgreSQL interactive terminal. Type: \copyright for distribution terms \h for help with SQL commands \? for help with psql commands \g or terminate with semicolon to execute query \q to quit michaelr=#
The line michaelr=#
is the prompt for SQL statements which are sent to the database server,
or non-SQL commands interpreted by psql. Here, "michaelr" is the name of the database; it may differ
on your system.
As the message suggests, you exit psql by typing \q
and hitting Enter (note
the lack of a semicolon; this is required because \q
is not an SQL statement and is not
interpreted at the server).
To run an SQL statement, just type it in. SQL statements can be split across multiple lines; to send the SQL statement to the server, end the statement with a semicolon and hit Enter. Depending on the command, psql will either respond with a confirmation message:
michaelr=# DELETE FROM hw1_data WHERE name='name';
DELETE 1
or display the results of the query in a table:
michaelr=# SELECT * FROM hw1_data;
name | discount | month | price
--------+----------+-------+-------
bar1 | 15% | apr | 19
bar8 | 15% | apr | 19
gizmo3 | 15% | apr | 19
gizmo7 | 15% | apr | 19
mouse1 | 15% | apr | 19
bar1 | 15% | aug | 19
bar8 | 15% | aug | 19
gizmo3 | 15% | aug | 19
gizmo7 | 15% | aug | 19
mouse1 | 15% | aug | 19
bar1 | 33% | dec | 19
bar8 | 33% | dec | 19
gizmo3 | 33% | dec | 19
[[Continues...]]
(426 rows)
If the result table is too large to fit in the shell window, it will be shown one window-ful at a time; press Enter to go on to the next window, until the end.
If you made a mistake while typing in a query, you can use the up-arrow and down-arrow keys on the keyboard to move between previously entered lines, which you can then edit and resubmit.
As in SQL Server Management Studio, you can use psql to run SQL code from an external file as
well as from interactive input. This can be done with the \i
psql command:
michaelr=# \i 'query.sql'
Note that psql follows PostgreSQL in allowing backslash escape sequences in character strings. This means that
an absolute pathname like D:\subdir\query.sql
must be written either by
doubling the backslashes, as in 'D:\\subdir\\query.sql'
, or turning them into forward slashes,
as in 'D:/subdir/query.sql'
.
Alternativeley, you can run psql with the query file directly from the shell:
> psql -f "query.sql" my_database
You can send the output of queries to a file instead of
(not in addition to) your console with the \o
psql command:
michaelr=# \o 'query_output.txt'
Note that the SQL code of queries will not be saved to the file. To stop saving query output and send it to your console again, use the same command, but omit the filename:
michaelr=# \o
You can import data from a file on the client computer into an existing database table using
the \copy
psql command:
michaelr=# \copy hw1_data from 'hw1-data.txt'
For the first homework, there is an SQL script that imports the data from the data file into a new table hw1_data. This script uses the \copy command internally.
In SQL Server Management Studio, you can view information about the columns, constraints, and indices on a table
through the tree view on the left side of the SSMS window. To get similar information in PostgreSQL,
you use the \d
psql command:
michaelr=# \d hw1_data
Table "public.hw1_data"
Column | Type | Modifiers
----------+-----------------------+-----------
name | character varying(50) |
discount | character varying(50) |
month | character varying(50) |
price | character varying(50) |
In SSMS, you can request that the estimated plan for a query be displayed by choosing the Query->Display Estimated Execution Plan menu item. The equivalent in PostgreSQL is the EXPLAIN statement of SQL, which produces a plain-text representation of the physical query plan. (EXPLAIN is not part of the SQL standard and does not appear in SQL Server. In SQL Server, the SET SHOWPLAN and SET STATISTICS families of statements provide similar functionality to EXPLAIN, but the syntax is more awkward.)
Here is an example of the use of EXPLAIN on a simple query:
michaelr=# EXPLAIN SELECT * FROM hw1_data;
QUERY PLAN
-----------------------------------------------------------
Seq Scan on hw1_data (cost=0.00..7.26 rows=426 width=17)
(1 row)
SSMS also lets you view the actual plan for a query once the query has been executed, by turning on the Query->Include Actual Execution Plan menu option before running the query. The equivalent function in PostgreSQL is the EXPLAIN ANALYZE variation of the EXPLAIN statement:
michaelr=# EXPLAIN ANALYZE SELECT * FROM hw1_data;
QUERY PLAN
-----------------------------------------------------------
Seq Scan on hw1_data (cost=0.00..7.26 rows=426 width=17)
(actual time=0.011..0.183 rows=426 loops=1)
Total runtime: 0.390 ms
(2 rows)