CSE341 Notes for Monday, 4/15/24

I continued our discussion of the binary search tree example (int_tree). I began by asking for the big-O complexity of our insert_all function. It simply calls the insert function for each of the n values in a list. So how much work does insert perform? We are still limiting ourselves to the immutable constructs in OCaml. That means that our binary trees are constructed in a very different way than what we saw in the 123 and 143 intro classes. There we would descend the tree until we encountered a null reference and we would replace that with a reference to a node. That relies on the ability to mutate a node to replace an existing reference with a new reference.

Consider the code we have for insert:

        let rec insert(value, tree) =
            match tree with
            | Empty                   -> Node(value, Empty, Empty)
            | Node(root, left, right) ->
                if (value <= root) then Node(root, insert(value, left), right)
                else Node(root, left, insert(value, right))
This code also involves descending the tree, but at each level we end up creating a new node that has either a different left subtree or a different right subtree. OCaml will share the part that isn't changed, but it has to make a new node to keep track of what has changed. So a single insertion is going to be O(log n) if the tree is balanced and it will involve creating log n new nodes for each insertion. Our trees aren't guaranteed to be balanced, but we've seen that with randomized data it tends to be close. So this insert takes the same amount of time as a Java insert, but it creates more nodes because it can't mutate an existing node. Overall we'd say that inserting a list of n values will be O(n log n) when we have data that leads to a tree that is close to being balanced.

Then I asked people how we'd write a method called contains that takes a value n and a tree and that returns true if the value is in the tree and false otherwise. Someone quickly pointed out the base case for the empty tree that it doesn't contain anything:

        let rec contains(tree, n) =
            match tree with
            | Empty -> false
            ...
Remember that you typically want a different case for each of your different type constructors. The case above handles the empty tree, so we'll also need a case for a nonempty tree:

        let rec contains(tree, n) =
            match tree with
            | Empty                   -> false
            | Node(root, left, right) -> 
                ...
Someone said that if the root data is equal to n, then the answer is true. If not, someone said we could see if it is in either subtree:

        let rec contains(tree, n) =
            match tree with
            | Empty                   -> false
            | Node(root, left, right) -> 
                root = n || contains(left, n) || contains(right, n)
This code works, but it's not very efficient. It would potentially search the entire tree. Remember that we are working with a binary search tree. So the better thing to do is to check either the left subtree or the right subtree, but not both:

        let rec contains2(tree, n) =
            match tree with
            | Empty                   -> false
            | Node(root, left, right) ->
                if root = n then true
                else if n < root then contains2(left, n)
                else contains2(right, n)
This version turns out to be quite efficient, which I demonstrated in the OCaml interpreter by typing:

        let t = insert_all(random_numbers(1000000))
        filter((fun x -> contains2(t, x)), 1--100000)
The request for 100 thousand calls on contains2 executed almost without pause.

As another example, I asked people how we could convert the binary search tree into a sorted list of ints. This took people a few minutes to figure out, but eventually we came to the realization that an inorder traversal of the tree will produce the values in sorted order, so we simply need to collapse it using an inorder traversal.

The easy case is to collapse an empty tree, which gives you an empty list:

        let rec collapse(tree) =
            match tree with
            | Empty -> []
            ...
As usual, we'll need a case for the nonempty tree:

        let rec collapse(tree) =
            match tree with
            | Empty                   -> []
            | Node(root, left, right) -> ...
In this case we want to recursively collapse the left and right subtrees and glue the two pieces together with the root data in the middle. Someone suggested doing it this way:

        let rec collapse(tree) =
            match tree with
            | Empty                   -> []
            | Node(root, left, right) ->
                collapse(left) @ root::collapse(right)
This code works well, but I have a slight preference in this case for expressing it as the appending of three different lists:

        let rec collapse(tree) =
            match tree with
            | Empty                   -> []
            | Node(root, left, right) ->
                collapse(left) @ [root] @ collapse(right)
The first version is better in the sense that it demonstrates an understanding of the difference between the cons operator(::) and the append operator(@), but I prefer the second because I conceive of the problem as putting together three different things. Both are perfectly fine ways to write the code.

We previously wrote a function to determine the height of a tree. A related notion is the depth of a given value stored in the tree. Consider, for example, this tree:

      18
        \
         27
        /
       9
What is the depth of the node with 9 in it? We compute it by finding the length of the path from the root to the node, but there are two things we could count: nodes or edges. For computing height, I'm a node counter and would say that this tree has a height of 3. That's how we wrote the function previously. At least I have Don Knuth on my side for that. I'm a node counter for height because I want to have a good answer to the question, "What is the height of the empty tree?" For me, the answer is 0. For edge counters, it's either -1 or undefined or "I'm not sure."

But for the depth of a specific value, I prefer the standard definition of counting edges because we don't have any odd situations for the empty tree (it has no nodes, so none of them have a depth). So I'd say that in this tree 18 has a depth of 0, 27 has a depth of 1, and 9 has a depth of 2.

Then I asked how we'd write a function to find the depth of a given value in a search tree. We usually start with an empty tree as the base case, but it's not clear what to return. What does it mean if you have gotten to an empty tree? It means the value wasn't found in the tree. Someone suggested we could return -1 in that case, the way we do for calls on indexOf in a language like Java when it doesn't find a value in a list:

        let rec depth_of(n, tree) =
            match tree with
            | Empty -> -1
            ...
If the value stored at the root is n, then this value has a depth of 0 (it appears in level 1 of the tree):

        let rec depth_of(n, tree) =
            match tree with
            | Empty                   -> -1
            | Node(root, left, right) ->
                if root = n then 0
            ...
What if it's not at the root? It's a binary search tree, so to be efficient, we should either search the left subtree or the right subtree, but not both:

        let rec depth_of(n, tree) =
            match tree with
            | Empty                   -> -1
            | Node(root, left, right) ->
                if root = n then 0
                else if n < root then 1 + depth_of(n, left)
                else 1 + depth_of(n, right)
I had a list of ints that I used to define a variable "test" that we could use for testing:

        let test = [40; 72; 15; 0; -8; 95; 103; 72; 272; 143; 413; 341]
Then I defined a variable "t" by calling the insert_all function we wrote in the previous lecture to produce a binary search tree.

        # let t = insert_all(test);;
                val t : int_tree =
                  Node (341,
                   Node (143,
                    Node (72,
                     Node (-8, Empty,
                      Node (0, Empty,
                       Node (15, Empty, Node (72, Node (40, Empty, Empty), Empty)))),
                     Node (103, Node (95, Empty, Empty), Empty)),
                    Node (272, Empty, Empty)),
                   Node (413, Empty, Empty))
I tried loading our definition for depth_of into the interpreter and using this variable t we could see that it worked fairly well:

        # map((fun x -> depth_of(x, t)), test);;
        - : int list =
        [7; 2; 5; 4; 3; 4; 3; 2; 2; 1; 1; 0]
But we ran into problems when we asked about a value not in the tree:

        # depth_of(42, t);;
        - : int = 7
We were expecting a value of -1. How did we end up with 7? The answer is that our solution to depth_of descends the tree looking for the given value and then adds 1 to the result as it comes back out. This value of 42 would become the right child of the leaf node that has 40 in it. Remember that 40 had a depth of 8. So when we find that its right child is empty, we return a -1 and then add one to the result 8 different times to get an overall result of 7.

We could fix this by converting this into a tail recursive function with an accumulator so that when we reach an empty tree, there is nothing left to do but to return that special value -1. Another option would be to raise an exception. So we could imagine that we have a precondition on the function that they shouldn't call it if the value isn't in the tree. That doesn't seem like a very friendly thing to do, though. Someone said they could call contains before calling depth_of, but that means they have to search the tree twice: once to see if it's there, and a second time to find its depth.

OCaml offers a good alternative. This is a good place to use what is known as an option type. An option is appropriate when the answer is "0 or 1 of" something. In other words, sometimes there is an answer and sometimes not. The technical term for this is a "one of" type (versus a tuple where you have to include all values, which is called an "each of" type).

This situation comes up often in programming and there are many different approaches for addressing it. For the Scanner class in Java if you try to read a value when you have reached the end of a file, it throws an exception. There are other reading operations in the Java class libraries that return -1 when there is no data to read. When you call get on a key that isn't in a map, it returns the value null.

What is the maximum value in an empty list? There isn't one. This is a similar case. What is the depth of a value n in the tree? Sometimes there is an answer and sometimes the answer is, "There isn't one."

The option type is defined as follows:

        type 'a option = None | Some of 'a 
Notice the use of 'a, which means that it is polymorphic (you can have an int option, string option, float option, etc). Using these two constructors, I tried to rewrite our definition:

        let rec depth_of(n, tree) =
            match tree with
            | Empty                   -> None
            | Node(root, left, right) ->
                if root = n then Some 1
                else if n < root then 1 + depth_of(n, left)
                else 1 + depth_of(n, right)
This produced an error. It's the same problem we had with the attempt to return -1. We have to be consistent. The recursive calls on left and right are adding 1 to the result being returned. You can't add one to an option and we can't return an option in one case and not the other. And we can't extract the value from the option because we don't know that the recursive call will succeed. So we have to convert this into a tail recursive version by introducing a helper function with an accumulator that keeps track of the current depth:

        let depth_of(n, tree) =
            let rec helper(t, depth) =
                match t with
                | Empty                   -> None
                | Node(root, left, right) ->
                    if root = n then Some depth
                    else if n < root then helper(left, depth + 1)
                    else helper(right, depth+ 1)
            in helper(tree, 0)
Obviously we will sometimes want to turn an option result into an actual value. You can do so using the function Option.get. For example:

        # let result = depth_of(t, 40);;
        val result : int option = Some 8
        # let d = Option.get(result);;
        val d : int = 8
There are also functions Option.is_some and Option.is_none that can be used to test which kind of option you have. More often we find ourselves using pattern matching with the None and Some constructors.

As one final example I said that I wanted to write a function list_max that would return the maximum value in a list. Using the option constructors, we can define base cases for an empty list and a one-element list:

        let rec list_max(lst) =
            match lst with
            | []    -> None
            | [x]   -> Some(x)
            ...
It's tempting to return x in the one-element list case, but we got an error message when we did so because then OCaml would assume that x is an int option instead of an int. For the final case, we want to compute the max of the rest of the list only once, so we introduced a let expression:

        let rec list_max(lst) =
            match lst with
            | []    -> None
            | [x]   -> Some(x)
            | x::xs -> 
              let max = list_max(xs)
              in if x > Option.get(max) then Some(x)
                 else max
    
It might seem odd that we're calling Option.get in the final case without testing whether there is a value to get, but keep in mind that we know that xs has at least one value in that branch, so we know that we will get a value returned.


Stuart Reges
Last modified: Tue Apr 16 13:23:41 PDT 2024