CSE 344 Homework 3

Objectives:
To practice advanced SQL and get familiar with the SQL Server database management system. We will reuse SQL Server in future homeworks.
Assignment tools:
SQL Server on IISQLSRV.
Due date:
October 24, 2011, at 11:59 pm Dropbox
What to turn in:
hw3-queries.sql

This homework is a continuation of homework 2, letting you practice more advanced SQL on the IMDB database. Here it is again for your reference:

ACTOR (id, fname, lname, gender)
MOVIE (id, name, year)
DIRECTORS (id, fname, lname)
CASTS (pid, mid, role)
MOVIE_DIRECTORS (did, mid)
GENRE (mid, genre)

All id fields are integers. MOVIE.year is an integer. All other fields are character strings.

id column in ACTOR, MOVIE & DIRECTOR tables is a key for the respective table.

CASTS.pid refers to ACTOR.id
CASTS.mid refers to MOVIE.id
MOVIE_DIRECTORS.did refers to DIRECTORS.id
MOVIE_DIRECTORS.mid refers to MOVIE.id
GENRE.mid refers to MOVIE.id

In this homework, you will do two things. First, you will connect to a DBMS server running on a machine called IISQLSRV. The DBMS that we will use in this assignment is Microsoft SQL Server. We already imported and indexed the IMDB database on that server. In the remainder of the assignment, you will simply write SQL queries. These queries, however, are challenging.

A. Connecting to SQL Server on IISQLSRV (3 points):

Follow the instructions here to connect to the database server.

We also have an instance of SQL Server running in the Microsoft Azure Cloud. You can use this server just like you use iisqlsrv.cs.washington.edu. The only differences are:

For fun, try and run some of the queries from the previous homework and see how fast they run compared to your local SQLite application.

 

B. SQL QUERIES (72 points; 12 points per question):

For each question below, write a single SQL query to answer that question. Add a comment to each query indicating the question number and the number of rows your query returns.

  1. Consider all actors that had five or more roles in a movie in 2010. In homework 2, we asked you to list each such actor's name, the movie name, and the number of roles he/she played. Do the same thing, but instead of giving the number of roles, give the name of each role. Your answer should have one tuple for each combination of (actor, movie, role) - so if an actor has 10 roles in a given movie, there should be 10 tuples for that actor and movie. Approx. 140 rows.

  2. For each year, count the number of movies in that year that had only female actors. Recall the meaning of the universal quantifier: a movie without any actors is also a movie with only female actors (since there are no male actors in such a movie!). Approx. 130 rows.

  3. Now make a small change: for each year, report the percentage of movies with only female actors made that year, and also the total number of movies made that year. For example, one answer will be:

    1990 31.81 13522

    meaning that in 1990 there were 13,522 movies, and 31.81% had only female actors. You do not need to round your answer. Approx. 130 rows.

  4. Find the film(s) with the largest cast. Return the movie title and the size of the cast. By "cast size" we mean the number of distinct actors that played in that movie: if an actor played multiple roles, or if the actor is simply listed more than once in CASTS, we still count her/him only once. You may not assume that only one film has the largest cast. 1 row.

  5. A decade is a sequence of 10 consecutive years. For example 1965, 1966, ..., 1974 is a decade, and so is 1967, 1968, ..., 1976. Find the decade with the largest number of films. 1 row.

  6. The Bacon number of an actor is the length of the shortest path between the actor and Kevin Bacon in the "co-acting" graph. That is, Kevin Bacon has Bacon number 0; all actors who acted in the same film as KB have Bacon number 1; all actors who acted in the same film as some actor with Bacon number 1 (but not with Bacon himself) have Bacon number 2, etc. Count how many actors have Bacon number is 2. 1 row.

Put all your code for part B (SELECT-FROM-WHERE code) in a file called hw3-queries.sql.