CSE341 Notes for Friday, 5/10/24

I continued our discussion of recursive descent parsing. We ended the previous lecture with this code for parse-factor:

        (define (parse-factor lst)
          (if (not (and (pair? lst) (string? (car lst))))
              (error "invalid syntax")
              (let ([first (car lst)]
                    [rest (cdr lst)])
                (if (and (pair? rest) (eq? (car rest) '+))
                    (let* ([result (parse-factor (cdr rest))]
                           [text (plus first (car result))])
                      (cons text (cdr result)))
                    lst))))
I next turned our attention to writing parse-term. Recall that we are working with this grammar:

        ; grammar is:
        ; <term> ::= <factor> | <factor> * <term>
        ; <factor> ::= <string> | <string> + <factor>
It's clear that a term always begins with a factor, so we began by calling that function to compute a result:

        (define (parse-term lst)
          (let ([result (parse-factor lst)])
            ...
I asked whether we need error checking code like we had for parse-factor and somebody said not for those cases because we start by calling parse-factor. So we can assume we have a nonempty list that starts with a string. The next step is to test to see whether the string is followed by *, which would mean we have to process that operator. Someone pointed out that we don't have a guarantee that there is a second value, so we have to be careful to first test that the list is at least two long:

        (define (parse-term lst)
          (let ([result (parse-factor lst)])
            (if (and (> (length result) 1) (eq? (cadr result) '*))
              ...
For the "then" part of the if/else, we know that we should be using our second rule from the grammar because we have a factor followed by * which should then be followed by a term. So the next step is to call parse-term on the result list, skipping the string and *:

        (define (parse-term lst)
          (let ([result (parse-factor lst)])
            (if (and (> (length result) 1) (eq? (cadr result) '*))
                (let ([result2 (parse-term (cddr result))]
                  ...
We should now have a list called result that has the first string we should use (possibly obtained by collapsing several tokens into one by the call on parse-factor) and a list called result2 that has the second string we should use (the result of collapsing the tokens that followed). So we are in a position to apply the * operator to figure out what string to include in our overall result:

        (define (parse-term lst)
          (let ([result (parse-factor lst)])
            (if (and (> (length result) 1) (eq? (cadr result) '*))
                (let* ([result2 (parse-term (cddr result))]
                       [text (times (car result) (car result2))])
As with parse-factor, we had to change this to a let* because we need the value of result2 to compute the value of text. At this point we're almost done. We have computed the string that should go at the front of our overall result, so as with parse-factor, we can just put the pieces together. We also need to include an "else" part for the case where we have no * operator to process:

        (define (parse-term lst)
          (let ([result (parse-factor lst)])
            (if (and (> (length result) 1) (eq? (cadr result) '*))
                (let* ([result2 (parse-term (cddr result))]
                       [text (times (car result) (car result2))])
                  (cons text (cdr result2)))
                result)))
This version behaved as expected:

        > (parse-term test1)
        '("(a, (b, (c, d)))")
        > (parse-term test2)
        '("[a-[b-[c-d]]]")
        > (parse-term test3)
        '("[(a, b)-[c-[(d, e)-[f-(g, h)]]]]")
I saved this version of the grammar in a file called grammar1.rkt.

In this version of the grammar, both operators evaluate right-to-left. I said that I wanted to change the * operator to evaluate left-to-right. We were able to make this change by switching the order of the nonterminals in the second rule:

        ; <term> ::= <factor> | <term> * <factor>
This produces a different kind of parse tree than we had before. For the expression "a * b * c" we get:

                             <term>
                           /    |   \
                        /       |      \
                     /          |         \
                <term>          |        <factor>
              /   |   \         |           |
            /     |    \        |           |
         <term>   |   <factor>  |        <string>
           |      |      |      |           |
           |      |      |      |           | 
        <factor>  |      |      |           |
           |      |      |      |           |
        <string>  |   <string>  |           |
           |      |      |      |           |
           a      *      b      *           c
Notice how "a * b" is grouped more closely together than the other * and c.

In this form, we can't simply translate this into function calls because parse-term would potentially call parse-term without reducing the input, which would lead to infinite recursion. To make this work, we have to somehow make the list shorter. The version we have currently calls parse-factor followed by a call on parse-term. The trick is to change the second call to a call on parse-factor:

        (define (parse-term lst)
          (let ([result (parse-factor lst)])
            (if (and (> (length result) 1) (eq? (cadr result) '*))
                (let* ([result2 (parse-factor (cddr result))]
                       [text (times (car result) (car result2))])
                  (cons text (cdr result2)))
                result)))
This isn't quite right because it processes a single * operator, replacing it with a new string. But there might be more * operators to process. So we need to include a call on parse-term with the overall result we have computed:

        (define (parse-term lst)
          (let ([result (parse-factor lst)])
            (if (and (> (length result) 1) (eq? (cadr result) '*))
                (let* ([result2 (parse-factor (cddr result))]
                       [text (times (car result) (car result2))])
                  (parse-term (cons text (cdr result2))))
                result)))
This version behaved as expected:

        > (parse-term test1)
        '("(a, (b, (c, d)))")
        > (parse-term test2)
        '("[[[a-b]-c]-d]")
        > (parse-term test3)
        '("[[[[(a, b)-c]-(d, e)]-f]-(g, h)]")
I said that I would store this version of the parsing functions in a file called grammar2.rkt.

I spent the remainder of the lecture describing the grammar we will be using for the homework. It comes from Python and the assignment involves implementing a mini version of the Python interpreter known as Idle. That information is all in the assignment writeup, so I won't repeat it here.


Stuart Reges
Last modified: Fri May 10 18:19:47 PDT 2024