People had lots of difficulty with was problem 6. Since it is such a fundamental concept (and it relates directly to something that presumably is familiar to you), I think it's important to really understand it. ==== 6. (14 points total) After graduating from the University of Washington, you have landed a highly coveted software engineer position at Minisquish Corporation. Despite president and CEO Gill Bates’s objections, the employees at the company have demanded that they each get a web page. Each web page has a fixed set of links to other pages in the company. However, at Bates’s request, no web page is allowed to have links to web pages outside the company. a. (8 points) Suppose you are given a list of web pages and, for each web page, which web pages it has links to. Describe an efficient algorithm (one that is as fast as possible) that computes, for *all* web pages, the fewest number of mouse clicks that is necessary to get from that web page to Gill Bates’s web page. b. (4 points) How long does your algorithm take in the worst case? (Be sure to clearly define any variables you use in your answer.) c. (2 points) Describe a situation where it would be impossible to get to Gill Bates’s web page from your web page by following links (starting at your web page). ==== The biggest challenge in a problem like this (and in "real-life" problems) is understanding what is being asked. We want to know, for *all* web pages, how many clicks it takes to get to Gill Bates's web page from that web page. In other words, for each web page, there is some number associated with it (the number of clicks to get to Bates's page), and we have to compute all of those numbers. We are given a list of web sites and, for each web site, a list of web sites it has a link to. Since we are talking about a network here, a graph is the straightforward way to represent the information. Each vertex represents a web page, and each edge represents a link (or a mouse click). The big question is what exactly the graph is. If we go with the natural way to define the graph: G = (V, E), where V is the set of web pages, and (u, v) is an edge in E if there is a link from web page u to web page v, then we run into problems. The best algorithm I know of is to run the unweighted shortest path algorithm (breadth-first search) on each vertex in the graph. For each vertex, we can stop its BFS if we reach Bates's page. The running time for a single BFS is O(|V| + |E|) in the worst case, so the total running time for our algorithm would be O(|V|^2 + |E||V|). We can do better. We would like to start a search from Bates's web page, so that we only have to do one BFS. But if we did that, then we would find out how many clicks it takes to get from Bates's web page to every other web page. This might be a different number from what we want. For example, if there are three web pages, X, Y, and Bates, and X links to Y, Y links to Bates, and Bates links to X, then the number of clicks from Bates to X is 1, whereas the number of clicks from X to Bates is 2 (which is what we want to compute). The key is to reverse the edges in the graph, so that a path from Bates's vertex in our new graph to some node X corresponds to a (backwards) sequence of mouse clicks from that node X to Bates's web page. Having thought through all this, we can now formulate a concise answer to problem 6: ==== a. Construct a graph G=(V, E), where each vertex in V is a web page, and (u, v) is in E if v has a link to u. (Notice that the edges are "reversed" from the information we are given.) Then perform a breadth-first search (unweighted shortest path algorithm) from Gill Bates's web page, numbering the vertices according to its distance from Bates's web page. (The code in Figure 9.18 in Weiss does this for us.) b. Breadth-first search takes O(|V| + |E|). (Note that constructing the "backwards" graph G above takes time O(|V|+|E|).) c. It is impossible to get from a web page v in V to Bates's web page exactly when there is no path from Bates's web page to v in G (where G is defined in the answer to part a). ==== Common mistakes people made on the exam: 1. Computing the number of clicks for one page, but not for all pages. 2. Thinking that this could be solved using a minimum spanning tree algorithm. 3. Constructing the "forward" edge graph and then performing BFS on each vertex. It produces a correct answer, but it is inefficient (by a factor of |V|). 4. Thinking that cycles have something to do with being able to get from a web page to Bates's web page. The only thing that matters concerning part c is whether there is a path from Bates's page to your page (in the "backwards" graph G above), or a path from your page to Bates's (in the "forward" graph that you would probably think of first). Thought question: Why was it important in this problem that no web page in the company was allowed to have links to web pages outside the company?