CSE341 Notes for Wednesday, 4/3/24

In the previous lecture we wrote a function called split that takes a list as a parameter and that splits it into two lists by taking values alternately from one list or the other. For example:

        # split(1--10);;
        - : int list * int list = ([1; 3; 5; 7; 9], [2; 4; 6; 8; 10])
Then we talked about how to write a function that would return the list obtained by merging two sorted lists into one sorted list. We had base cases for one or the other list being empty:

        let rec merge(lst1, lst2) =
            match (lst1, lst2) with
            | ([], ys) -> ys
            | (xs, []) -> xs
            ...
Notice that in this case we are matching a tuple with the match expression rather than a simple variable. We considered a case where both lists are empty, but we concluded that the first case takes care of that (actually either case takes care of it, but the first case is the one that will end up handling it). We then considered the case where each list has at least one value:
        let rec merge(lst1, lst2) =
            match (lst1, lst2) with
            | ([], ys)       -> ys
            | (xs, [])       -> xs
            | (x::xs, y::ys) -> ...
Someone said we test whether x is less than y. Given that test, we either put x or y at the front of the answer and we recurse on the tail of the list with the smaller value and the complete other list:

        let rec merge(lst1, lst2) =
            match (lst1, lst2) with
            | ([], ys) -> ys
            | (xs, []) -> xs
            | (x::xs, y::ys) -> if x < y then x::merge(xs, lst2)
                                 else y::merge(lst1, ys)
This version of the function worked fine.

Then we talked about how to implement the merge sort algorithm. In doing so, we can use the split function discussed in the prior lecture. Recall that it takes a list as a parameter and it returns a tuple of two lists, each with half of the values from the original list.

In the general case, we split the list, sort the two sublists and then merge the two sorted lists. We used a let expression to introduce variables for the two lists that come back from a call on split:

        let rec merge_sort(lst) =
            let (lst1, lst2) = split(lst)
            in ...
If you think procedurally, you might think of it as three more steps:

  1. sort the first sublist
  2. sort the second sublist
  3. merge the two together
We want to express this in a more functional way using a single expression:

        let rec merge_sort(lst) =
            let (half1, half2) = split(lst)
            in merge(merge_sort(half1), merge_sort(half2))
We tried running this version of the code and found that it didn't work. It went into infinite recursion. The problem is that we have no base cases. We need a base case for an empty list and a base case for a 1-element list.

        let rec merge_sort(lst) =
            match lst with
            | []  -> []
            | [x] -> [x]
            | _   -> let (half1, half2) = split(lst)
                     in merge(merge_sort(half1), merge_sort(half2))
I used the underscore wildcard for the final case as a way of saying "in all other cases." It's like saying "else". This version worked well.

Then we discussed tail recursion. I began by showing this function that adds up 1 through n:

        let rec sum(n) =
            if n = 0 then 0
            else n + sum(n - 1)
This worked fine for smaller values of n, but when we asked it to compute the sum up to 10 million, it didn't return an answer. I said that I wanted to explore a variation that is written in a tail recursive way. I asked if anyone remembered what tail recursion involves and someone said it's like a for loop or while loop. That's not a bad intuition because we know that tail recursive solutions are easily rewritten as loops that don't involve recursion.

I asked people how we would solve this problem in Java. We'd write code like this:

        int sum = 0;
        for (int i = 1; i <= n; i++) {
            sum += i;
        }
Several times I've tried to make the point that you can turn this kind of loop code into a functional equivalent. If it was useful for the loop to have an extra variable for storing the current sum, then we can do the same thing with a helper function. We can have a 2-argument function that keeps track of the current sum in addition to the value of i. Using that idea, I wrote the following variation of sum:

        let sum2(n) =
            let rec helper(next, result) =
                if next = 0 then result
                else helper(next - 1, next + result)
            in helper(n, 0)
Notice that the helper function is recursive but the overall function is not. When we loaded this into the interpreter we found that it was able to compute the sum up to 10 million quickly. It even quickly computed the sum up to 100 million. So why does this version run so much faster than the other? Consider how each version computes its answer. This is how sum(5) is computed:

        sum(5) =
        5 + sum(4) =
        5 + 4 + sum(3) =
        5 + 4 + 3 + sum(2) =
        5 + 4 + 3 + 2 + sum(1) =
        5 + 4 + 3 + 2 + 1 + sum(0) = 15
Notice how the computation expands as we make recursive calls. After we reach the base case, we'll have a lot of computing left to do on the way back out. But notice the pattern for the second version:

        sum2(5) =
        helper(5, 0) =
        helper(4, 5) =
        helper(3, 9) =
        helper(2, 12) =
        helper(1, 14) =
        helper(0, 15) = 15
There is no expansion to the computation. The key thing to notice is that once we reach the base case, we have the overall answer. There is no computation left as we come back out of the recursive calls. This is a classic example of tail recursion. By definition, a tail recursive function is one that performs no additional computation after the base case is reached.

This extra variable used to keep track of the sum is sometimes refered to as an accumulator. It is well known that tail recursive functions are easily written as a loop. Functional languages like Scheme and OCaml optimize tail recursive calls by internally executing them as if they were loops (which avoids generating a deep stack of function calls).

Then I asked people how to write a function that would return a list in reverse order. For example, the call reverse(1--4) should return [4; 3; 2; 1]. We use the familiar pattern match of an empty list and a non-empty list, but we had to think about what to do with the "x" at the front of the list we are processing. Someone suggested using the cons operator (::) to add it to the end of the list.

        let rec reverse(lst) =
            match lst with
            | []    -> []
            | x::xs -> reverse(xs)::x
The problem is that :: expects a value followed by a list and this has it in the reverse order. Someone mentioned that we could use the append operator instead:

        let rec reverse(lst) =
            match lst with
            | []    -> []
            | x::xs -> reverse(xs) @ [x]
This version works, but it is inefficient. To explain why, I spent a few minutes discussing how the :: (cons) and @ (append) operators work in OCaml.

Consider the following bindings:

        let x = [1; 2; 3]
        let y = 0::x
        let z = x @ [4]
OCaml stores lists internally as a linked list of nodes that each have data and a reference to the next node (these correspond to the head and the tail). So the structure is similar to what we get with standard Java linked list nodes:

        public class ListNode {
            public int data;
            public ListNode next;
            ...
        }
So when we execute:

        let x = [1; 2; 3]
OCaml creates a list of 3 nodes:

        x --> [1] --> [2] --> [3]
What happens when we execute the second binding?

        let y = 0::x
Recall that the :: operator is referred to as "cons," which is short for "construct." In other words, it constructs a new list element:

        y --> [0] --> ??
This new list element has 0 as the head, but what does it use for the tail? OCaml could make a copy of the list that x refers to, but that's not what happens. instead, it sets up a link that shares the memory set aside for x:

             x --> [1] --> [2] --> [3]
                    ^
                    |
        y --> [0] --+
In the Java universe this would be a bad idea. If you share part of the list, then you're likely to end up with a confusing outcome. For example, what if you use the variable y to traverse the list and you change the second and third values in the list. The second and third values in the list that y refers to are the first and second values in the list that x refers to. So in that case, changing y would change x as well. Normally we'd want to avoid that kind of interference.

This is where the concept of mutable state comes into play. In OCaml, lists are immutable. Once you have constructed a list element with a particular head and tail, you can never change them. So it's not dangerous to allow this kind of sharing because OCaml prevents this kind of interference. This is an example where the choice to prevent mutation has made it easier for OCaml to be efficient about the allocation of space.

You can simulate this in Java as well. To get immutable lists in Java, you'd make the fields final:

        public class ListNode {
            public final int data;
            public final ListNode next;
            ...
        }
But what about the final binding?

        let z = x @ [4]
This is a case where OCaml can't make use of sharing. The variable x refers to a list that ends with 3. If you tried to change it to instead point to a new list element storing 4, then you'd be damaging the original list. So this is a case where OCaml has no choice but to make a copy of the contents of x:

        z --> [1] --> [2] --> [3] --> [4]
A simple rule of thumb to remember is that the :: operator always executes in O(1) time (constant time) because it always constructs exactly one list element while the @ operator runs in O(n) time where n is the length of the first list because that first list has to be copied.

So let's return to the inefficient version of the reversing function:

        let rec reverse(lst) =
            match lst with
            | []    -> []
            | x::xs -> reverse(xs) @ [x]
Because we are using the @ operator, we are going to be creating lots of list copies. In effect, to reverse a list of n elements, we're going to make a copy of a list of length 1, and a copy of a list of length 2, and a copy of a list of length 3, and so on, ending with making a copy of a list of length n-1. That will require O(n2) time. We saw that when we made a call on this version of reverse passing it a list with 10 thousand elements (1--10000).

We can do better than that. I asked people how you'd approach it iteratively. We came up with this pseudocode:

        set result to an empty list
        while (list we are reversing is not empty) {
           x = remove first element of list we are reversing
           add x to result list
        }
You can translate an iterative process like this in a functional equivalent by thinking about the different states that this computation goes through. There are two different variables involved here: list and result. Here's how they change as you iterate through the loop assuming we are working with the list 1--4.:

        list            result
        ----------------------------
        [1; 2; 3; 4]    []
        [2; 3; 4]       [1]
        [3; 4]          [2; 1]
        [4]             [3; 2; 1]
        []              [4; 3; 2; 1]
Instead of having two variables that change in value each time you iterate through the loop (the mutable state approach), you can instead have a function of two arguments where each time you call the function you compute the next pair of values to use in the computation. So we'll write this using a helper function:

        let reverse(lst) =
            let rec helper(rest, result) = ??
            in helper(??)
The loop starts with list being the overall list and result being empty. In the functional version, we make this the initial call on the helper function:

        let reverse(lst) =
            let rec helper(rest, result) = ??
            in helper(lst, [])
The loop ends when list becomes empty, in which case the answer is stored in result, so this becomes one of the cases for our helper function:

        let reverse(lst) =
            let rec helper(rest, result) =
                match rest with
                | [] -> result
                ...
            in helper(lst, [])
Now we just need a case for the other iterations. In the pseudocode, we pulled an x off the front of the list and moved it to result. We can accomplish this with a pattern of x::xs for the list and by moving x into the result in our recursive call:

        let reverse(lst) =
            let rec helper(rest, result) =
                match rest with
                | []    -> result
                | x::xs -> helper(xs, x::result)
            in helper(lst, [])
We saw that this version worked and ran in a reasonable amount of time even for lists with a million values.

I ended the lecture by writing a function called qsort that uses the quicksort algorithm to sort a list. We began with our usual question about an easy list to work with and someone said we should have a case for empty list, so we began with:

        let rec qsort(lst) =
            match lst with
            | [] -> []
Then I asked if anyone remembered how quicksort works. Someone mentioned that the main steps are to:

  1. Pick a value from the list that we refer to as the pivot.

  2. Split the list into 2 parts: values less than the pivot and values greater than the pivot. I said that this step is often referred to as partitioning the list and the two parts are often referred to as partitions. I mentioned that we have to include the possibility of values equal to the pivot, although it doesn't matter which partition we put them into.

  3. Quicksort the two partitions.

  4. Put the pieces together.

One nice thing about quicksort is that putting the pieces together isn't difficult. We know that the values in the first partition all come before the values in the second partition, so it's just a matter of gluing the pieces together.

I said that we'd keep things simple by using the first value in the list as our pivot. This isn't an ideal choice, especially if the list is already sorted, but it will work well for the randomized lists we want to work with. So I changed the variable names in our code to reflect this choice:

         let rec qsort(lst) =
             match lst with
             | []          -> []
             | pivot::rest ->
The first step in solving this is to partition the list, so I asked people how to implement this in OCaml and people seemed to be stumped. That's not surprising because we're just starting to learn OCaml and this is a nontrivial computation. I said that you don't have to abandon your programming instincts from procedural programming. So how would you solve it in a language like Java if you were asked to work with a linked list?

By thinking through that, we developed the following pseudocode for partitioning the list:

        start with an empty lst1
        start with an empty lst2
        while (more values from original list)
            look at first element
            if it is <= pivot, put it in lst1
            else put it in lst2
        }
        finish up
I said that you can convert this iterative solution to a recursive solution without much trouble. With a loop-based approach you use a set of local variables to keep track of the current state of your computation. Here there are three such local variables: the list that we are partitioning, the first partition, and the second partition. Local variables like these become parameters to a helper function. Our helper function is supposed to split the list, so we decided to call it "split". Since the loop involves three variables, two of which are initialized to be empty lists, we write split as a helper function of three parameters and we include empty lists for two of the parameters in the initial call:

         let rec qsort(lst) =
             match lst with
             | []          -> []
             | pivot::rest ->
                 let rec split(lst, part1, part2) = ?
                 in split(rest, [], [])
Notice how the call after the word "in" exactly parallels the situation before our loop begins executing. Our three state variables are the list of values to split (which I have called "rest") and two variables for storing the two partitions which are initially empty.

In our pseudocode we continue until the overall list becomes empty, at which time we "finish up" the computation. We can include this as one of the cases for our helper function:

         let rec qsort(lst) =
             match lst with
             | []          -> []
             | pivot::rest ->
                 let rec split(lst, part1, part2) =
                     match lst with
                     | [] -> finish up
                 in split(rest, [], [])
Compare this initial case for split versus the initial call for split. We go from having these three values for our computation:

        (original list, [], [])
to having this set of values for our computation:

        ([], partition 1, partition 2)
In other words, we go from having all of the values stored in the original list and having two empty partitions to having an empty list of values and two partitions that have been filled in. Now we just have to describe how we go from one to the other. In our pseudocode, each time through the loop we handled one element of the original list, either moving it to partition 1 or moving it to partition 2. We can do the same thing with our helper function. So first we need to indicate that in the second case for split, we want to process one value from the original list. We do so by including a pattern for lst with "x::xs":

.

         let rec qsort(lst) =
             match lst with
             | []          -> []
             | pivot::rest ->
                 let rec split(lst, part1, part2) =
                     match lst with
                     | []    -> finish up
                     | x::xs -> ?
                 in split(rest, [], [])
We had an if/else in the loop and we can use an if/else here. I asked people if they wanted to include values equal to the pivot in the first or second partition and the first person to answer said to put them in the first partition, so we expanded this to:

         let rec qsort(lst) =
             match lst with
             | []          -> []
             | pivot::rest ->
                 let rec split(lst, part1, part2) =
                     match lst with
                     | []    -> finish up
                     | x::xs -> if x <= pivot then (x goes in 1st partition)
                                else (x goes in 2nd partition)
                 in split(rest, [], [])
If x belongs in the first partition, then we want to go from having this set of values:

        (x::xs, part1, part2)
to having this set of values:

        (xs, x::part1, part2)
In other words, we move x from the list of values to be processed into the first partition. In the loop, we'd then come around the loop for the next iteration. In a recursive solution, we simply make a recursive call with those new values:

         let rec qsort(lst) =
             match lst with
             | []          -> []
             | pivot::rest ->
                 let rec split(lst, part1, part2) =
                     match lst with
                     | []    -> finish up
                     | x::xs -> if x <= pivot then split(xs, x::part1, part2)
                                else (x goes in 2nd partition)
                 in split(rest, [], [])
In the second case we do something similar, moving x into the second partition and using a recursive call to continue the computation:

         let rec qsort(lst) =
             match lst with
             | []          -> []
             | pivot::rest ->
                 let rec split(lst, part1, part2) =
                     match lst with
                     | []    -> finish up
                     | x::xs -> if x <= pivot then split(xs, x::part1, part2)
                                else split(xs, part1, x::part2)
                 in split(rest, [], [])
The only thing we had left to fill in was the "finish up" part. We often return an empty list when the list we are processing becomes empty. While that might be an appropriate choice for a splitting utility, it's not what we want here. Remember that reaching that point in the code is like finishing the loop in our pseudocode. We started out by saying that we need to quicksort the two partitions. That needs to be part of what we do in the "finish up" part:

         let rec qsort(lst) =
             match lst with
             | []          -> []
             | pivot::rest ->
                 let rec split(lst, part1, part2) =
                     match lst with
                     | []    -> need to: qsort(part1), qsort(part2)
                     | x::xs -> if x <= pivot then split(xs, x::part1, part2)
                                else split(xs, part1, x::part2)
                 in split(rest, [], [])
The qsort function returns a list, so what do we do with the two sorted lists that come back from these recursive calls? We glue them together with append:

         let rec qsort(lst) =
             match lst with
             | []          -> []
             | pivot::rest ->
                 let rec split(lst, part1, part2) =
                     match lst with
                     | []    -> qsort(part1) @ qsort(part2)
                     | x::xs -> if x <= pivot then split(xs, x::part1, part2)
                                else split(xs, part1, x::part2)
                 in split(rest, [], [])
This is not quite right, though, because we haven't accounted for the pivot. We didn't add it to either of our partitions. That's, in general, a good thing, because it guarantees that each of the partitions we recursively pass to qsort will be smaller than the original list. But it means we have to manually place the pivot into the result. It belongs right in the middle of the two partitions. We could use a cons operator ("::") to indicate this, but I think it's a little clearer in this case to show that we are gluing three pieces together, one of which is the pivot itself:

        let rec qsort(lst) =
            match lst with
            | []          -> []
            | pivot::rest -> 
                let rec split(lst, part1, part2) =
                    match lst with
                    | [] -> qsort(part1) @ [pivot] @ qsort(part2)
                    | x::xs -> if x <= pivot then split(xs, x::part1, part2)
                               else split(xs, part1, x::part2)
                in split(rest, [], [])
At that point we were done. This is a working version of quicksort. I loaded it in the OCaml interpreter and we managed to sort some short lists. It is polymorphic, so it can be used to sort lists of any type.


Stuart Reges
Last modified: Fri Apr 5 11:48:57 PDT 2024