CSE341 Notes for Friday, 5/17/24

We continued our discussion of blocks and the yield statement. We started with a simple method that yields four times:

        def f
          yield
          yield
          yield
          yield
        end

This method can be used to execute some bit of code four times, as in:

        >> f {puts "hello"}
        hello
        hello
        hello
        hello
        => nil

I said that I think the best way to think of this is that there are two bits of code that switch back and forth. When we call f, we start executing its code, but every time that f calls yield, we switch back to the code passed in the block and execute it. So this method switches back and forth four times.

It's more interesting when we yield a value:

        def f
          yield 43
          yield 79
          yield 19
          yield "hello"
        end

We can still pass simple code like before and it will execute four times:

        >> f {puts "hello"}
        hello
        hello
        hello
        hello
        => nil

But with this version, we have the option of writing a block that includes a parameter:

        >> f {|n| puts n * 3}
        129
        237
        57
        hellohellohello
        => nil

We can also have yield produce more than one result:

        def f
          yield 43, 17
          yield 79, 48
          yield 19, "bar"
          yield "hello", 39
        end

We can then execute a block that takes two parameters:

        >> f {|m, n| puts m; puts n; puts}
        43
        17
        
        79
        48
        
        19
        bar
        
        hello
        39
        
        => nil

In the example above I use semicolons to separate the three statements. Another way to do this is using the do..end form for a block:

        f do |m, n|
          puts m
          puts n
          puts
        end

We saw that we could even write f so that it sometimes yields one value and sometimes yields two values, as in:

        def f
          yield 43
          yield 79, 48
          yield 19, "bar"
          yield "hello"
        end

If we then write a block that takes two parameters, the second parameter will be set to nil when yield supplies just one value. We can test this to make sure that we do the right thing when the second parameter is nil, as in:

        f do |m, n|
          puts m
          puts n if n
          puts
        end

which produces this output:

Notice that we don't have to say "if n == nil". In Ruby, nil evaluates to false and anything that is not either false or nil evaluates to true.

I again pointed out the idea that a block is a closure. For example, suppose you introduce these definitions into irb:

        def g(n)
          return 2 * n
        end
        
        a = 7

The variable a is a local variable and the method g is a method of the main object. And yet, we can refer to these in writing a block:

        >> x = MyRange.new(1, 5)
        => #<MyRange:0xb7ef74d4 @last=5, @first=1>
        >> x.each {|n| puts n + g(n) + a}
        10
        13
        16
        19
        22
        => nil

The each method is in a separate class, so how does it get access to the local variable a and the method g? That works because in Ruby a block keeps track of the context in which it appears, giving you access to any local variables and remembering the value of "self" (the object you were talking to when you defined the block).

Then I asked people how we could implement a method that would be like the filter function in OCaml. The idea would be to pass a predicate as a block and to return a list of all values that satisfy the predicate. For example, we might ask for a list of all even numbers in a range by saying:

        x.filter {|n| n % 2 == 0}

We went to our class definition and introduced a new method header:

        def filter

I asked people how to do this and someone said we'd need to start with an empty list of values:

        def filter
          result = []

Then we have to go through every value in the range. In the each method we did that with a while loop that incremented a local variable. For this method we can use a for-each loop to keep things simple. But what does it loop over? It loops over the object itself. In Java we use "this" to refer to the object. In Ruby we use "self":

        def filter
          result = []
          for i in self
            ...
          end

And what do we want to do inside the loop? We want to test the value i to see if it satisfies our predicate. We do so with a call on yield passing it i. If it returns true, we add that value to the end of our list using the push method of the Array class:

        def filter
          result = []
          for i in self
            if yield(i) then
              result.push i
            end
          end

This is an unusual use of yield. Calling yield(i) is not unusual. That's what we did before. What's unusual is that we are using the value returned by that call in our if expression. So information flows in both directions. We pass the value of i to a block and we use the value returned by that block to decide whether or not include that value in our result.

And what's left to do after that? We just have to return our result:

        return result

Some Ruby programs don't like to use "return". They simply list the value to return because a call on a Ruby method returns whatever the last expression evaluation returns:

        result

I tend to include the return, but mostly because I'm only a tourist in Rubyland and I'm more used to that syntax. Putting this all together, we ended up with the following complete filter method:

        def filter
          result = []
          for i in self
            if yield(i) then
              result.push i
            end
          end
          return result
        end

It worked as expected when we tried to filter for even numbers or numbers divisible by 3:

        >> x = MyRange.new(1, 20)
        => #<MyRange:0xb7f0d7fc @last=20, @first=1>
        >> x.filter{|n| n % 2 == 0}
        => [2, 4, 6, 8, 10, 12, 14, 16, 18, 20]
        >> x.filter{|n| n % 3 == 0}
        => [3, 6, 9, 12, 15, 18]

Then I pointed out several higher order functions that Ruby has. They are similar to what we have seen in OCaml and Scheme. There is a map method that expects a block specifying an operation to apply to each value in a structure:

        >> x = [1, 42, 7, 19, 8, 25, 12]
        => [1, 42, 7, 19, 8, 25, 12]
        >> x.map {|n| 2 * n}
        => [2, 84, 14, 38, 16, 50, 24]

There is a find function that expects a block that specifies a predicate:

        >> x.find {|n| n % 3 == 1}
        => 1

This version finds just the first occurrence. If you want to find them all, you can call find_all which is really just another name for filter:

        >> x.find_all {|n| n % 3 == 1}
        => [1, 7, 19, 25]

Ruby also has methods for determining whether every value satisfies a certain predicate and whether all values satisfy a certain predicate:

        >> x.any? {|n| n % 3 == 1}
        => true
        >> x.all? {|n| n % 3 == 1}
        => false

These are computational equivalents of the mathematical existential quantifier ("there exists") and universal quantifier ("for all").

Then I discussed the inject method. When you don't supply a parameter, it behaves like the reduce function in OCaml (collapsing a sequence of values into one value of the same type):

        >> [3, 5, 12].inject {|a, b| a + b}
        => 20

But you can also call it with a parameter, in which case it behaves like foldl:

        >> [3, 5, 12].inject("values:") {|a, b| a + " " + b.to_s}
        => "values: 3 5 12"

It's nice that Ruby has the inject function for other types as well like ranges:

        >> (1..20).inject("values:") {|a, b| a + " " + b.to_s}
        => "values: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20"

I then spent a few minutes talking about a few extra features of Ruby. Just as there is a "puts" method to write a line of output, there is a "gets" method that reads a line of input from the user:

        >> x = gets
        hello there
        => "hello there\n"

I think it's unfortunate that Ruby decided to include the newline characters as part of the string returned by gets. There is a standard Ruby method called chomp that can be used to eliminate newline characters:

        >> y = gets.chomp
        how are you?
        => "how are you?"

Then we reviewed file-reading operations. I mentioned that I particularly like the readlines method, as in:

        lst = File.open("hamlet.txt").readlines

This read in the entire contents of Hamlet into an array of strings. We were then able to ask questions like how many lines there are in the file or what the 101st line is:

        irb(main):002:0> lst.length
        => 4463
        irb(main):003:0> lst[100]
        => "  Hor. Well, sit we down,\r\n"

I asked people how we could write code to count the number of occurrences of various words in the file. We'd want to split each line using whitespace, which you can get by calling the string split method, as in:

        irb(main):004:0> lst[100].split
        => ["Hor.", "Well,", "sit", "we", "down,"]

To store the counts for each word, we need some kind of data structure. In Java we'd use a Map to associate words with counts. We can do that in Ruby with a hashtable:

        irb(main):005:0> count = Hash.new
        => {}

As we saw in an earlier lecture, we can use the square bracket notation to refer to the elements of the table. For example, to increment the count for the word "hamlet", we're going to want to execute a statement like this:

        count["hamlet"] += 1

Unfortunately, when we tried this out, it generated an error:

        irb(main):006:0> count["hamlet"] += 1
        NoMethodError: undefined method `+' for nil:NilClass
                from (irb):6
                from :0

That's because there is no entry in the table for "hamlet". But Ruby allows us to specify a default value for table entries that gets around this:

        irb(main):007:0> count = Hash.new 0
        => {}
        irb(main):008:0> count["hamlet"] += 1
        => 1
        irb(main):009:0> count
        => {"hamlet"=>1}

Using this approach, it was very easy to count the occurrences of the various words in the lst array:

        irb(main):007:0> count = Hash.new 0
        => {}
        irb(main):010:0> for line in lst do
        irb(main):011:1*     for word in line.split do
        irb(main):012:2*       count[word.downcase] += 1
        irb(main):013:2>     end
        irb(main):014:1>   end

After doing this, we could ask for the number of words in the file and the count for individual words like "hamlet":

        irb(main):022:0> count.length
        => 7234
        irb(main):023:0> count["hamlet"]
        => 28

The File object can be used with a foreach loop, so we could have written this same code without setting up the array called lst:

        irb(main):024:0> count = Hash.new 0
        => {}
        irb(main):025:0> for line in File.open("hamlet.txt") do
        irb(main):026:1*     for word in line.split do
        irb(main):027:2*       count[word.downcase] += 1
        irb(main):028:2>     end
        irb(main):029:1>   end
        => #<File:hamlet.txt>
        irb(main):030:0> count.length
        => 7234
        irb(main):031:0> count["hamlet"]
        => 28

The key point here is that it is possible to write just a few lines of Ruby code to express a fairly complex operation to be performed. We'd expect no less from a popular scripting language.

Stuart Reges

Last modified: Fri May 17 14:31:06 PDT 2024