CSE331 Autumn 2011
Software Design and Implementation

Assignment 4 (10%): Due Wednesday November 9, 2011 at 11:59PM

This assignment was motivated by a number of questions from students (on Piazza, in office hours, etc.) and some answers to some of the midterm questions. Alert: It's a seat-of-the-pants definition of the assignment, so it's possible there are gaps in it. For anyone who finds it straightforward, there is an optional part, but it is not part of the grade.

Program Description

One question on the midterm was about random test generation. A number of answers said things lik

The randomly generated array length might not be consistent with the number of values in the array.
The randomly generated array might not be sorted.
Random keys are much more likely to be not found than to be found.
There's no way to determine the oracle.

We will try to address these issues in this assignment. (There are some other problems we will not address in this assignment including making sure to cover edge cases and the ability to repeat random tests deterministically.) There are also parts intended to enhance your understanding of abstraction functions and representation invariants, and perhaps a bit on design patterns.

Here's the basic idea. You will write a program that generates legal test cases for binary search, specifically the Java 6.0 binarySearch over int arrays, which simply takes a sorted int array and a key. How will you address the four issues above?

Length and number of values must be consistent: A straightforward way to handle this is to generate a length (within some bounds, using some distribution) and then create an array of that size for which you will generate random values.
- You should select a length for the array between 0 and 1000 (inclusive); you should select values for the ints in the array between [-10000,10000] (inclusive).
- These values should be easy to change (at least using constants and optionally using a configuration file).
Array must be sorted. There are two ways to do this. The first is to sort a randomly generated array. The second is to generate the array in a way so that it is sorted. You will implement the second way, which itself has two approaches. One, you can ensure that your code never generates an int less than the most-recently generated int; this is straightforward. Two, you can generate ints in any order, and insert them into a data structure that can easily be traversed to produce a sorted array. You will implement the second way.
- You will use a binary search tree (BST). This is not to be mistaken for a binary search algorithm, although there are some commonalities. Here is the wikipedia entry on BST. And an applet that animates a BST might help you (I recommend using the random button to play with it at first). You will not have to be concerned about deletion of nodes nor about any destructive updates (for saving space, rebalancing, etc. -- you will have to update node links as described in a bullet below). The basic idea of a BST is to create a binary tree with the following properties.
  - Each node in the tree has an int value and two children, a left child and a right child.
  - The value of a node has to be greater than or equal to the value of its left child, and less than or equal to the value of the right child.
  - Elements are inserted sequentially -- the first value goes in the top node of the tree (there are different ways to handle this) with two null children. After that, a value is inserted by comparing it to the top node, determining if it is less than or equal to the value of the node -- in which case it is recursively inserted in the tree represented by the left child -- or greater than or equal to the value of the node -- in which case it is recursively inserted in the tree represented by the right child. When a null node is encountered, a null node is created, filled with the inserted value, and linked in appropriately to the existing BST.
- There are plenty of BST implementations in Java on the web, which you can look at. Here is one that I haven't fully vetted but looks fine (except I'd prefer traversal to use the visitor pattern explicitly). You can base your own implementation on one of these, but you must give appropriate credit. And make sure that your implementation doesn't include methods, etc. that you do not need for this assignment. (This is one way we will be able to tell if you took an existing implementation, didn't cite it, and didn't understand it enough to modify it. That would be bad :-).)
Good distribution of found/not found keys. Once you generate the sorted array, you can use any distribution you wish to select a key that is found or not found. For instance, if you want 10% of the tests to have keys that should be found, roll a 10-sided die and use it to decide whether to randomly select one of the elements in the array or to randomly select an int not in the array.
- Like #1, the percentage of found elements should be easy to change.
No oracle. Once you've handled the previous point, you've essentially created an oracle -- you know if it should be found or not found, and if it's found you know the index. (This is a tad tricker than it looks, because if there are multiple occurrences of the key, you might choose one and the binarySearch might choose the other.)

A few more things:

You must generate runnable JUnit tests and store them into a file named BinarySearchRandomTests.java. (Although this is a bit of a pain, it allows you to more easily test your program by running the generated tests on binarySearch.)
You might decide to first generate the basics, and then work on the formatting into JUnit tests. You can copy-and-paste the tests into Eclipse.
You should generate a complete test class -- imports, etc., so that copy-and-paste will "just work."
Your program must accept a command-line parameter indicating how many JUnit tests you are to generate during a single run of your program.
You must be explicit about the representation invariant for the BST and must implement it as repOK(). You will want this to ensure that you have a legal BST.
You must be explicit about the abstraction function for the BST -- this is somewhat easier than the rep invariant, but still takes a little attention. This appears in documentation, not in the code per se.
You must write tests to test your program carefully.

Warnings

Remember, this is not necessary a well-designed assignment. This suggests -- more strongly than usual -- that you start very early.
You have to keep your mind clear, especially in the following dimensions
- binarySearch (which you are testing) vs. binary search trees (which you are using in your implementation)
- Tests you are generating for binarySearch vs. tests of your program that generates those tests.

Optional Options

If anybody wants additional challenges, here are some options. (Again, not for credit. So make sure you have the base program right first, and then work from a copy of that.)

Support the ability to rerun a test deterministically by storing the values generated for a test (array, key, oracle). Depending on how you expect this feature to be used, it could be implemented with a pattern.
Figure out a clean way to increase the likelihood of generating some edge cases -- for instance, perhaps skew the distributions. (This could be used to make some specific sized arrays more/less likely, and to make some placements of found keys more/less likely, etc.)
Make it more generic -- this may be tricky with respect to generating the random elements.

CSE331 Autumn 2011 Software Design and Implementation

CSE331 Autumn 2011
Software Design and Implementation