People had lots of difficulty with was problem 6.  Since
it is such a fundamental concept (and it relates directly
to something that presumably is familiar to you), I
think it's important to really understand it.

====
6. (14 points total) After graduating from the University
of Washington, you have landed a highly coveted software
engineer position at Minisquish Corporation.  Despite
president and CEO Gill Bates’s objections, the employees
at the company have demanded that they each get a web page.

Each web page has a fixed set of links to other pages in
the company.  However, at Bates’s request, no web page is
allowed to have links to web pages outside the company.

a. (8 points) Suppose you are given a list of web pages and,
for each web page, which web pages it has links to.  Describe
an efficient algorithm (one that is as fast as possible) that
computes, for *all* web pages, the fewest number of mouse
clicks that is necessary to get from that web page to Gill
Bates’s web page.

b. (4 points) How long does your algorithm take in the worst
case?  (Be sure to clearly define any variables you use
in your answer.)

c. (2 points) Describe a situation where it would be impossible
to get to Gill Bates’s web page from your web page by following
links (starting at your web page).
====

The biggest challenge in a problem like this (and in "real-life"
problems) is understanding what is being asked.  We want
to know, for *all* web pages, how many clicks it takes to get
to Gill Bates's web page from that web page.  In other words,
for each web page, there is some number associated with it
(the number of clicks to get to Bates's page), and we have
to compute all of those numbers.

We are given a list of web sites and, for each web site, a list
of web sites it has a link to.
Since we are talking about a network here, a graph is the
straightforward way to represent the information.  Each vertex
represents a web page, and each edge represents a link (or
a mouse click).

The big question is what exactly the graph is.  If we go with
the natural way to define the graph:

G = (V, E), where V is the set of web pages, and (u, v) is an
edge in E if there is a link from web page u to web page v,

then we run into problems.  The best algorithm I know of is
to run the unweighted shortest path algorithm (breadth-first
search) on each vertex in the graph.  For each vertex, we can
stop its BFS if we reach Bates's page.  The running time for
a single BFS is O(|V| + |E|) in the worst case, so the total
running time for our algorithm would be O(|V|^2 + |E||V|).

We can do better.  We would like to start a search from
Bates's web page, so that we only have to do one BFS.  But if
we did that, then we would find out how many clicks it takes
to get from Bates's web page to every other web page.  This
might be a different number from what we want.  For example,
if there are three web pages, X, Y, and Bates, and X links to Y,
Y links to Bates, and Bates links to X, then the number
of clicks from Bates to X is 1, whereas the number of clicks
from X to Bates is 2 (which is what we want to compute).

The key is to reverse the edges in the graph, so that a path
from Bates's vertex in our new graph to some node X corresponds
to a (backwards) sequence of mouse clicks from that node X to
Bates's web page.

Having thought through all this, we can now formulate a concise
answer to problem 6:

====
a. Construct a graph G=(V, E), where each vertex in V is a web
page, and (u, v) is in E if v has a link to u.  (Notice that the
edges are "reversed" from the information we are given.)

Then perform a breadth-first search (unweighted shortest path
algorithm) from Gill Bates's web page, numbering the vertices
according to its distance from Bates's web page.  (The code
in Figure 9.18 in Weiss does this for us.)

b. Breadth-first search takes O(|V| + |E|).  (Note that
constructing the "backwards" graph G above takes time
O(|V|+|E|).)

c. It is impossible to get from a web page v in V to Bates's
web page exactly when there is no path from Bates's web page
to v in G (where G is defined in the answer to part a).
====

Common mistakes people made on the exam:
1. Computing the number of clicks for one page, but not for all
pages.
2. Thinking that this could be solved using a minimum spanning
tree algorithm.
3. Constructing the "forward" edge graph and then performing
BFS on each vertex.  It produces a correct answer, but
it is inefficient (by a factor of |V|).
4. Thinking that cycles have something to do with being able
to get from a web page to Bates's web page.  The only thing
that matters concerning part c is whether there is a path
from Bates's page to your page (in the "backwards" graph G
above), or a path from your page to Bates's (in the "forward"
graph that you would probably think of first).

Thought question:  Why was it important in this problem that
no web page in the company was allowed to have links to
web pages outside the company?