CSE143 Notes for Friday, 10/21/05

I began by reviewing the concept of interfaces. I mentioned that the basic idea is to specify a certain set of behaviors without specifying how those behaviors are implemented. As an example, I asked people to think about sorting a list of items. We can put numbers into sorted order and we can put words into sorted order. But can we sort absolutely anything? The answer is no. Not every kind of data has an ordering to it.

So then I turned to a specific class to think about how this might work. I said to consider the String class. It has a boolean method called "equals" that allows you to test whether two Strings are equal. But how would we find out if one String is less than another or greater than another? We might imagine having boolean methods like "less" or "greater", but instead the String class has a single compareTo method. Given a couple of Strings:

        String str1 = "hello";
        String str2 = "there";

You can imagine asking for:

        str1.compareTo(str2)

or asking for:

        str2.compareTo(str1)

What kind of value should compareTo return? It can't be boolean because there are three possible answers (less, equal, greater). Someone suggested that it could return an integer and that we could decide that -1 means "less", that 0 means "equal" and that 1 means "greater". This is very close to how it actually works, except for the fact that Sun decided to give the implementor of compareTo a little more flexibility in what value to return. The rule in Java is that:

if compareTo returns a negative integer, we interpret it to mean "less"
if compareTo returns 0, we interpret it to mean "equal"
if compareTo returns a positive integer, we interpret it to mean "greater"

Exactly what negative or positive integer is returned is left up to the person who implements the class. In the case of String, the method returns the difference between the character values of the first two characters that differ in the Strings (in the example, str1.compareTo(str2) returns ('h' - 't'), which equals -12, indicating that "hello" is less than "there").

The use of an integer compareTo method to return ordering information is a standard convention used throughout the Java class libraries. But since not all data can be ordered, not every class has a compareTo method. So how do we describe the subset of Java classes that have such a method? This is where interfaces come in. There is an interface in Java known as the Comparable interface. Some people pronounce it as "come-pair-a-bull" and some people pronounce it as "comp-ra-bull". Either pronunciation is okay. The interface contains a single method called compareTo.

        public interface Comparable<T> {
            public int compareTo(T other);
        }

Notice that this is a generic declaration using "T" to stand for "Type". And notice that we have just a method header for compareTo. Instead of a method body that would be enclosed in curly braces we have just a semicolon. That's because an interface indicates that the method should exist, but doesn't say how it is implemented.

Given this interface, we can define the subset of Java classes for which sorting makes sense. Anything that implements the Comparable interface can be sorted. This is a useful way to think about classes. For a class like String, it allows us to think about Strings in two different ways. We can think of it specifically as a String object or we can think of it as a Comparable object. A useful term for this idea is that String can fill different "roles". It obviously can fill the role of a String because it is a String, but it can also fill the role of a Comparable object because it's one of those as well.

I mentioned that the best analogy I have for interfaces is that they are similar to how we use the concept of certification. You can't claim to be a certified doctor unless you have been trained to do certain specific tasks. Similarly, to be a certified teacher you have to know how to behave like a teacher, to be a certified nurse you have to know how to behave like a nurse, and so on. In Java, if you want to claim to be a certified Comparable object, then you have to know how to compare yourself to other objects like yourself (i.e., you need an appropriate compareTo method). Java classes are allowed to implement as many interfaces as they want to, just as a single person might be certified to do several jobs (e.g., both a certified doctor and a certified lawyer).

I said that in terms of the stack and queue structures we were looking at, we can describe abstractly the specific behaviors we are looking for and give that collection of behaviors a name like "Stack" or "Queue". That way we are specifying what it takes to be considered a stack or a queue. We have been using this concept for many years in programming. It is traditionally referred to as an Abstract Data Type or ADT:

        A bstract
        D ata
        T ype

With Java, we have actual language support for this concept. Not only can we talk abstractly about the stack ADT or the queue ADT, we can actually define an interface for each. This allows us to talk about the behavior of the ADT separately from the implementation of the behavior.

I then mentioned that this is an idea that has been used throughout the collections classes in Java (the java.util package). I read a short passage from the book "Effective Java" by Joshua Bloch, which I said was one of the most useful books I've read about Java. Joshua Bloch was the primary architect of the collections framework and has influenced much of Sun's work. He now works at Google.

In the collections framework, Bloch was careful to define ADTs with interfaces. For example, there are interfaces for List, Set and Map which are ADTs that we'll be discussing this quarter (we've already been examining the List concept). In addition to the interfaces, there are various implementations of each. For example, ArrayList and LinkedList both implement the List interface.

Bloch's book is written as a series of 57 suggested practices for Java programmers. I have a link to the book and the suggested practices from our class web page (under "useful links"). His item 34 is to "Refer to objects by their interfaces." The passage I quoted was, "you should favor the use of interfaces rather than classes to refer to objects. If appropriate interface types exist, parameters, return values, variables and fields should all be declared using interface types." This last sentence was in bold face in the book, indicating how important Bloch thinks this is, and I've reproduced that here. He goes on to say that, "The only time you really need to refer to an object's class is when you're creating it."

You'll find almost identical language in my assignment writeup for Sieve. I ask you to declare all variables using the Queue interface and say that you should use the LinkedQueue class name only when it is constructed (i.e., only after the word "new"). This is something we're going to be taking points off for when people don't follow the rule because it's an important point to understand about interfaces versus implementations and it is an important style issue, which is why Bloch included it in his book.

I found I had only ten minutes left to continue my discussion of complexity. On Monday we had seen three different algorithms for solving a particular task. We found that they had dramatically different behavior when we ran them.

I pointed out that I see a lot of undergraduates who obsess about efficiency and I think that in general it's a waste of their time. Many computer scientists have commented that premature optimization is counterproductive. Don Knuth has said that "Premature optimization is the root of all evil." The key is to focus your attention where you can get real benefit. The analogy I'd make is that if you found that a jet airplane weighed too much, you might decide to put the pilot on a diet. While it's true that in some sense every little bit helps, you're not going to make much progress by trying to slim down the pilot when the plane and its passengers and cargo weigh so much more than the pilot. I see lots of undergraduates putting the pilot on a diet while ignoring much more important details.

In the case of our three algorithms from Monday, I think some people would choose the first or second approach and would torture themselves to get it to run as quickly as possible. But the real breakthrough comes from choosing the third approach that had a much better growth rate than the other two.

In terms of the growth rate of different algorithms, I mentioned that some of your intuitions from calculus will be helpful. You've probably been asked to solve problems like figuring out what the limit is as you approach infinity of an expression like this:

        n^3 - 18 n^2 + 385 n + 708
        --------------------------
        0.005 n^4 - 13 n^2 + 73842

When you solve a limit like this, you ignore things like coefficients and you ignore small terms. What matters here is that you basically have:

        n^3
        ---
        n^4

The rest is noise. So this is something that you know is going to approach 0 because eventually the n^4 will dominate the n^3 no matter what the coefficients and lower-order terms are. We use similar reasoning with complexity. We ignore constant multipliers and we ignore lower order terms to focus on the main term.

I said that different algorithms naturally fall into different complexity classes:

O(1): Constant time algorithms that don't depend on "n".
O(log n): Logarithmic algorithms like binary search that we used in SortedIntList.
O(n): Linear algorithms like linear search that we used to implement indexOf in the original IntList class.

We ran out of time, so I had to cut this discussion short, but we'll return to this topic later and discuss other typical complexity classes.

Stuart Reges

Last modified: Thu Oct 27 16:32:07 PDT 2005