CSE341 Notes for Friday, 3/29/24

I began by mentioning that from a historical point of view, ML (parent of OCaml) was the first programming language to have what is known as parametric polymorphism. When we ask OCaml about functions like hd or tl, we get strange notations that involve "'a":

        # List.hd;;
        - : 'a list -> 'a = <fun>
        # List.tl;;
        - : 'a list -> 'a list = <fun>
The more modern term for this is generics It is similar to the way that we define an ArrayList<E> in Java. The "E" is a type parameter and indicates that we can make many different kinds of ArrayList objects (ArrayList<String>, ArrayList<Integer>, etc). In a similar manner, OCaml is letting you know that the List.hd and List.tl functions can act on many different kinds of lists (string list, int list, etc).

I also mentioned that a good way to understand the list type in OCaml is to recognize that it is very much like the linked structures we studied in the introductory class. For example, you would define a generic node as follows:

        public class ListNode<E> {
            public E data;         // data stored in this node
            public ListNode next;  // link to next node in the list
        }
The "data" field is what you get when you call List.hd and the "next" part is what you get by calling List.tl.

I then spent time discussing the concept of mutable state. We try to avoid mutable state in functional programming. This will seem odd at first, because it is such a central technique in procedural programming that it's difficult to imagine how you can program without it. For example, in a language like Java we often have an integer variable n that we will increment by saying something like n++. This is a very typical example of mutable state. We have a memory location for the variable x that we change (or mutate) over time.

At first glance, OCaml variables appear to have the same ability. After all, we can say:

        # let x = 3;;
        val x : int = 3
        # let x = x + 1;;
        val x : int = 4
But this has a very different effect in OCaml. You should think of an OCaml program as a sequence of bindings that are stored in an environment. The code above introduces two different bindings for the variable x. The second binding makes use of the value from the first binding, but this is very different from sharing a single memory location that different code segments can all refer to. I said that this distinction is difficult to understand at first, but it turns out to be very important.

I asked if anyone could think of a situation where it might matter that both bindings are preserved and someone said that it could matter if you define a function that refers to x, as in:

        # let x = 3;;
        val x : int = 3
        # let f(n) = n * x;;
        val f : int -> int = <fun>
        # f(5);;
        - : int = 15
        # let x = x + 1;;
        val x : int = 4
        # f(5);;
        - : int = 15
The function f refers to x but doesn't have a local variable called x, which means it's going to look to the global environment. We refer to such a variable as a "free variable" and we'll have a lot to say about free variables later. But for now, just notice that the call on f(5) gives the same result even when we introduce a second binding for x. The function uses the binding that was current at the time it was defined.

I asked people to consider whether immutability is good. I mentioned that Joshua Bloch, the architect of the Java class libraries, has written a book called Effective Java in which he gives a series of "tips" for programming well in Java. Item 17 is "Minimize mutability." For example, String objects are immutable in Java. I asked people what that means and whether it's a good thing.

What it means is that once a String object is constructed, it can never be changed. At first, this seems like a dangerous and inefficient decision. For example, this loop generates 1001 different String objects just to put together a String of 1000 stars:

        String s = "";
        for (int i = 0; i < 1000; i++) {
            s += "*";
        }
I mentioned that Java has alternatives called StringBuffer and StringBuilder that don't have this inefficiency. But why do this with String?

Someone mentioned that it can be problematic to have two different variables pointing to the same object if they have the ability to change that object. The two different variables can interfere with each other. For example, we're all used to writing Java constructors that take String objects as parameters:

        public Employee(String name, ...) {
            this.name = name;
            ....
        }
This is a potentially dangerous operation. Someone can give you a String that you remember as the name of the employee and they can turn around and modify the String. For example, I might try to trick a program into thinking that I'm Bill Gates and later change the String to my name instead (e.g., having it send a big paycheck to me instead of Bill). If Strings were mutable, we'd find ourselves wanting to make a defensive copy in a constructor like this. That's what we tend to do in languages like C++.

This doesn't even have to be malicious. For example, someone might be using a mutable single String to create a series of different Employee objects. So they change the value of the name string and pass the new name. If you just store a reference to that String, then you end up with a series of Employee objects that all have the same name.

But there are even further problems. We're also used to writing code like this in Java:

        public String getName() {
            return name;
        }
This is another place where you'd want to make a copy of the String if it was mutable because otherwise you're allowing the person who calls your method to have access to your mutable String. They might maliciously or accidentally damage the String.

Joshua Bloch's item 50 is to "Make defensive copies when needed" to solve exactly these kind of problems. We don't have to worry about that for type String because it is immutable. So this is at least one example of where immutability is helpful. In general, immutability eliminates many potential programming problems.

As another example, I asked people to consider a function f that returns an int and I asked under what circumstances we can replace this expression:

        f(x) + f(x)
with this expression:

        2 * f(x)
Someone said you can do the replacement if "f doesn't do anything else." That's a good way to look at it. Another phrase that is used for this is that we say that f has no side effects.

What kind of side effects might it have? It might change the value of a global variable that is used in the computation. For example, it might do something like this:

        globalCount++;
        return globalCount * x;
In that case, the second call on the function will return a different value than the first call because the computation depends on a variable whose value has changed. This is an example of the problems introduced by mutable state. If you use the simple mechanisms in OCaml, you won't get this kind of interference. There is no way to use simple variable binding, for example, to effect a global variable like this. OCaml does have some language elements that allow you to do this (references and arrays), but those are considered the "bad" mutable part of OCaml (the bad part of town).

Another case where this would make a difference is if the function produces output. For example, in Java if it called System.out.println, then you'll get different behavior by calling it twice versus calling it once. This is another kind of side effect and we'd call it another example of mutable state (changing the state of the output stream). OCaml has functions for reading and writing and they, also, are considered aspects of OCaml that detract from the purely functional approach.

There is a technical term for the ability to replace the sum of the two function calls with 2 times a single call. It's called referential transparency.

I mentioned that OCaml lists are immutable. One way to understand it is that this is like a Java list where the nodes have "final" fields:

        public class ListNode<E> {
            public final E data;         // data stored in this node
            public final ListNode next;  // link to next node in the list
        }
As a result, OCaml can avoid making copies of lists. For example, suppose you have created a rather long list:

        # let lst = [13; 7; 45; 9; 42; -3; 15; 2; 3; 5; 7];;
        val lst : int list = [13; 7; 45; 9; 42; -3; 15; 2; 3; 5; 7]
Consider the following situations where you create variations of this list:

        # let lst2 = lst;;
        val lst2 : int list = [13; 7; 45; 9; 42; -3; 15; 2; 3; 5; 7]
        # let lst3 = List.tl(lst);;
        val lst3 : int list = [7; 45; 9; 42; -3; 15; 2; 3; 5; 7]
        # let lst4 = 0::lst;;
        val lst4 : int list = [0; 13; 7; 45; 9; 42; -3; 15; 2; 3; 5; 7]
All of these operations can be performed in constant time (not related to the length of lst) because they can share the structure of lst. I asked people when you couldn't share structure in this way and someone said when you append two lists together:

        # let lst5 = lst @ lst;;
        val lst5 : int list =
          [13; 7; 45; 9; 42; -3; 15; 2; 3; 5; 7; 13; 7; 45; 9; 42; -3; 15; 2; 3; 5; 7]
We are asking OCaml to append two copies of the list together. That first copy can't end the way lst does with 7 as the final value. It has to connect to the first value in lst (13). That means we will need to make a copy of all of the elements of lst. We can, however, avoid making a second copy of lst because in that case we can share the original list.

We then wrote a function called count_down that takes an integer n as a parameter and that returns a list of int values starting with n and counting down to 0. For example, count_down(10) should return [10; 9; 8; 7; 6; 5; 4; 3; 2; 1; 0]. I pointed out that our function won't work for negative values of n, so we should document that as a precondition of the function. I have asked students to include precondition comments for homework 1.

        (* pre: n >= 0 *)
        let rec count_down(n) =
            if n = 0 then [0]
            else n::count_down(n - 1)
Then I asked how we could write a count_up function that returns a list starting with 1 and counting up to n. Someone suggested that we could reverse the order in our cons operator (::):

        let rec count_down(n) =
            if n = 1 then [1]
            else count_down(n - 1)::n
This didn't work. Instead of combining a value and a list, we are giving it the value and list in reverse order. Someone suggested that could change this to a call on our append operator:

        let rec count_down(n) =
            if n = 1 then [1]
            else count_down(n - 1) @ [n]
This version worked, but it turned out to be slow. When I asked it to compute count_down(100000) it never finished. Why? It's because it is making copies of very long lists. Think of just the last recursive call where it has to append a list that is 99,999 long with a single value, which requies making a copy of that list that is 99,999 long. This turns what could be an O(n) operation into an O(n^2) operation.

So how do we fix it? Someone suggested that we could introduce a helper function that has both a low and a high parameter:

        let rec helper(low, high) =
            if low > high then []
            else low::helper(low + 1, high)
Then we were able to define count_up in terms of this helper function:

        let count_up(n) = helper(1, n)
Then I introduced a new OCaml concept. We looked at how a let construct can be used to create a local binding that isn't introduced into the overall environment. It's similar to the idea of a local variable in Java that appears inside of a block (i.e., inside a set of curly braces {}). For example, we might say:

        let x = 0.35 in x *. x *. x *. x
This is a convenient way to give the name x to the numeric value we want to use in this expression while also keeping that name local to just this expression. It simplifies the expression without introducing a binding for x in the global environment. The general form of the let construct is:

let <binding> in <expression> We could read it as, "Let the following binding hold in evaluating this expression." Most of the OCaml constructs we are studying evaluate to something. That is true of this new version of the let expression. For example, we can put parentheses around the let expression above and form more complex expressions, as in:

        (let x = 0.35 in x *. x *. x *. x) +. 2.5
We can't do the same thing with the other form of let which simply defines a new binding for the top level environment:

        (let x = 0.35) +. 2.5
This produces an error message because this form of let does not produce a value.

You can include a function binding as well as a variable binding in a let expression. Often we use a let to define a helper function for some other function.

For example, we rewrote our count_up function to define the helper function as a local function that is not added to the top-level environment:

        let count_up(n) = 
            let rec helper(low, high) =
                if low > high then []
                else low::helper(low + 1, high)
            in helper(1, n)
I asked if we could simplify the helper function now that it is nested within the count_up function and someone mentioned that "high" is always equal to "n". So we simplified this to be:

        let count_up(n) = 
            let rec helper(low) =
                if low > n then []
                else low::helper(low + 1)
            in helper(1)
I mentioned that starting with the second homework we will ask students as a style issue to localize helper functions within the function that uses them.

I then mentioned to people that the helper function we wrote for count_up is actually something that I find very useful to have available to me. It gives a sequential list of values given a low and high. I asked if that sounds familiar and someone said it is like the idea of a range. Python, for example, allows you to write loops using a call on a range function.

I said that for our purposes I liked the idea of introducing an infix operator for this, as in:

        # 1--10;;
        - : int list = [1; 2; 3; 4; 5; 6; 7; 8; 9; 10]
OCaml lets you define infix operators like this. Remember that we refer to the function by putting parentheses around the operator, as we did with the addition operator:

        # (+);;
        - : int -> int -> int = <fun>
I modified the helper function to be an infix operator instead:

        let rec (--) x y =
            if x > y then []
            else x::(x + 1--y)
It may seem odd that we don't put the parameters x and y inside parentheses. That's because in OCaml infix operators don't take a tuple. We will study this later in the quarter (it is called a curried function). But for now, know that we will have the -- operator available to us. I tend to include such functions in a file called utility.ml that I include with my OCaml programs.


Stuart Reges
Last modified: Fri Mar 29 14:15:26 PDT 2024