CSE 373 Assignment 2

Due Friday April 14 in class.

You do not have to write C code unless the problem specifically says it is a programming problem. High level pseudocode is easier to write; just express the steps needed without worrying about exact details like array bounds, loop syntax etc. We just want you to express what needs to be done, and if a step is a loop or other operation that takes non-constant time, to make that clear.

Read the assignment guidelines!

1.  Balancing parentheses. Consider a program error checker that reads in the text of a program and checks for a limited set of syntax errors. Specifically, there are some special symbols or keywords that must always be in a balanced pair, such as ( ) [ ] { }. There are some important rules for pairs. 
(i) every starting symbol must have a matching end symbol;
(ii) every end symbol must come after its matching start symbol (e.g. ] [ is illegal);
(iii) pairs must be closed off in the reverse order in which they were started (e.g. ( [ { ] } ) is illegal);
(iv) pairs may be nested inside each other (e.g. [ [ ( ) ] ] is legal);
(v) a brace pair { } may only be nested within another brace pair, not within a parentheses or bracket pair.

All other symbols and words in the program can be skipped, since we are only checking for this specific type of error.

a) Of the data structures we've talked about so far (lists, stacks, queues, trees, binary search trees, AVL trees, hash tables), which one or ones would you use in this program? Motivate your choice in terms of the problem definition above. Why is your choice appropriate?

b) What sequence of operations would you need to perform for each of the six symbols above when they appear during your scan of the input file? Express the operations in terms of ADT calls on the data structure(s) you chose above, plus any necessary C code (or variables) to go with it. Give pseudocode; this shouldn't require more than a handful of lines each. If the input is illegal, you must report which rule is violated, and the specific symbols in the input that were illegal.

c) For each of the six symbols that you handled in part b, explain how your pseudocode fragment upholds all 5 rules above. You don't have to give a formal proof, but your argument must be sufficiently detailed that no one will wonder if you have a bug lurking in your algorithm. Just running a program on sample data isn't good enough, because you might overlook an important case. We want a convincing explanation of why each step you take is correct. Every possible illegality must be caught and noted by your algorithms.

 

2. Problems 4.2 and 4.3 (same numbers in both books).

 

3. For figure 4.59 (C++ 4.67), list the 13 elements of the tree in the following traversal orders:
a) preorder
b) postorder
c) inorder

 

4. Give an inductive proof that a binary tree of height H has 2H leaf nodes, maximum. (Recall that a tree with a single node is considered to have a height of one.)

 

5. Use the result you proved in part 4 to solve book problem 4.5 (same number in both books).

 

6. Given an input sequence { 2004, 5907, 9283, 7113, 3102, 6294 } and a hash function h(X) = X mod 10, show the resulting hash tables (table size = 10) under the following schemes.
a) chaining
b) linear probing
c) quadratic probing

7. Describe whether you think the following features would be easy or difficult to add to a hash table. Explain why. (It's OK to modify the data structure slightly, e.g. by storing a little extra data. Don't use a whole extra ADT.) Give asymptotic time costs for the best algorithm you can think of in each case. You don't need to give code for the algorithm, just a sentence or two saying what would need to be done. If adding the feature would require extra work inside the Insert or Find operations, describe the problem. Assume the hash function is fairly complex and scatters things throughout the table with no discernible pattern.
a) GetCount( ) -- returns number of items stored in the table
b) GetMax( ) -- returns the largest item in the table
c) GetMin( ) -- returns the smallest item in the table
d) FindInRange(x, y) -- if there is an item k in the table, with x < k < y, return any such item k.

 

8. (not graded) How difficult would you rate this assignment, on a scale from 1 (too easy), to 5 (just right), to 10 (too hard)?  How many hours did you spend on the written questions above?  How many hours did you spend on the programming section below?

 

Programming Section

The goal this week is to give you a small programming assignment to get you back into the lab, and to see where you have trouble.  Since much of data structure programming in C or C++ involves pointer manipulation, we'll do the tiny subset of a full ADT which does the pointer munging.

You will implement AVL tree rotations. I have provided an implementation of binary search trees in the following files, bst.cpp, bst.h, treein.txt (a default input file). Read through the source and make sure you understand what the code is doing. Your job is to modify the Insert routine to make the BST into an AVL tree. You'll probably want to take a careful look at the book's discussion of AVL trees, and write a few support functions for the 4 different cases.

To be explicit, there are several tasks you'll need to do to complete this part of the assignment.

i) Modify the data structure to include the extra info needed by an AVL tree in each node
ii) Modify the insert routine to check for height imbalances after an insertion. You need to check for all 4 cases separately, since they require slightly different handling. The book can help you here. Fix the imbalances using the appropriate rotation type. The routine also needs to update height info after an insert.
iii) Write the support routines for all four cases.
iv) Modify the dump() routine to include AVL height info, so you can see what you've done. Make liberal use of this routine to help you see what your program is doing.

One big danger area I suggest you pay careful attention to is updating the height info. There may be more than one place where you have to do this. Also note that, since AVL rotation is an O(1) operation, it is unacceptable if you take O(log N) or O(N) time after every insert to fix the height info in the tree.  You must do it in constant time.

We'll send info later on how to turn in your files online. You also need to turn in a printout of your code with the rest of the assignment.