CS227 handout #26

handout #26

CSE143—Computer Programming II

Programming Assignment #8 (part 2)

due: Tuesday, 8/16/05, 9 pm

(courtesy of Stuart Reges)

This assignment is worth a total of 30 points. This document explains part 2 of the assignment, which is worth approximately half of the 30 points. Handout 25 has the description of part 1.

There are two new main programs that are used in this part of the assignment. Recall that MakeCode.java examined an input file and produced a code file for compressing it. The program Encode.java uses this code and the original file to produce a binary file that is the compressed version of the original. The program Decode.java uses the code and the binary file from Encode to reconstruct the original file. Encode is a complete program that doesn’t need the Huffman tree. Decode depends on your HuffmanTree class to do most of the work. In particular, you will have to add two new methods to your HuffmanTree class:

Method	Description
HuffmanTree(Scanner input)	Constructs a Huffman tree from the Scanner. Assumes the Scanner contains a tree description in standard format.
void decode(BitInputStream input, PrintStream output, int eof)	Reads bits from the given input stream and writes the corresponding characters to the output. Stops reading when it encounters a character with value equal to eof. This is a pseudo-eof character, so it should not be written to the output file. Assumes the input stream contains a legal encoding of characters for this tree’s Huffman code.

The first of these methods is an alternative constructor. In part 1 you wrote a constructor that took an array of frequencies and constructed an appropriate tree given those frequencies. In this part you are given a Scanner that contains the file produced by your write method from part 1. In other words, the input for this part is the output you produced in part 1. You are using your own output to recreate the tree. For this second part, the frequencies are irrelevant because the tree has already been constructed once, but you are using the same node class as before. You can set all of the frequencies to some standard value like 0 or -1 for this part.

Remember that the standard format was a series of pairs of lines where the first line has an integer representing the character’s ASCII value and the second line has the code to use for that character. You might be tempted to call nextInt() to read the integer and nextLine() to read the code, but remember that mixing token-based reading and line-based reading is not simple. Here is an alternative that uses a method called parseInt in the Integer class that allows you to use two successive calls on nextLine():

int n = Integer.parseInt(input.nextLine());

String code = input.nextLine();

For the decoding part, you have to read a BitInputStream. This is a special class that Stuart wrote that works in conjunction with another class called BitOutputStream. They each have a very simple interface. They allow you to write and read compact sequences of bits.

The only method you’ll use for BitInputStream is the following method which returns the next bit from the file:

public int readBit()

Your method is doing the reverse of the encoding process. It is reading sequences of bits that represent encoded characters and it is figuring out what the original characters must have been. Your method should start at the top of your tree and should read bits from the input stream, going left or right depending upon whether you get a 0 or 1 from the stream. When you hit a leaf node, you know you’ve found the end of an encoded sequence. At that point, you should write the integer code for that character to the output file. In doing so, call this method from the PrintStream class:

public void write(int b)

You don’t need to cast to char. You just write the integer for this particular character (the value between 0 and 255 that is stored in this leaf). Once you’ve written this character’s integer, you go back to the top of your tree and start over, reading more bits and descending the tree until you hit a leaf again. At that point you write again, go back to the top of the tree, read more bits and descend until you hit a leaf, then write the leaf, go back to the top of the tree, and so on.

Recognizing the end of the input will be tricky. Remember that we introduced a pseudo-eof character with a special value (256). The Encode method will write exactly one occurrence of this character at the end of the file. At some point your decoding method will come across this eof character. At that point it should stop decoding. It should not write this integer to the PrintStream because it isn’t actually part of the original file. The eof value will be 256 for this particular program, but your code isn’t supposed to depend on this specific value, which is why it is passed to your decode method as the third parameter.

If you fail to recognize the pseudo-eof character, you might end up accidentally reading past the end of the bit stream. When that happens, the readBit method returns a value of -1. So if you see a value of -1 appearing, it’s because you’ve read too far in the bit stream.

In terms of correctness, your class must provide all of the functionality described above. In terms of style, we will be grading on your use of comments, good variable names, consistent indentation and good coding style to implement these operations. Remember that you will lose points if you declare variables as data fields that can instead be declared as local variables. You should also avoid extraneous cases (e.g., don’t make something into a special case if it doesn’t have to be).

There is one turn-in for both parts of this assignment. Recall that you have to turn in HuffmanNode.java and HuffmanTree.java. A collection of files needed for this part of the assignment is included on the web page as ass8-2.zip. You should include Encode.java, Decode.java, BitInputStream.java and BitOutputStream.java in the same directory as your other program files. The zip file includes encoded versions of the sample files called short.short and hamlet.short. Your Decode program should be able to take one of these files and the corresponding code file to reconstruct the original file.

Extra Credit (5 points)

There is an extra credit option for this assignment that is worth 5 points. There will be no partial credit.

If you do the extra credit option, you are still required to complete the standard HuffmanTree and to submit it along with your HuffmanNode. So if you work on this, do so only after you have completed the standard assignment. To keep things clear, for this part of the assignment you should create a class called HuffmanTree2. You can copy your HuffmanTree class and modify it appropriately to get the initial version of this class.

The main goal of this variation is to eliminate the code file. When you use a utility like zip, you don’t expect it to produce two output files (a code file and a binary file). You expect it to produce one file. That’s what we’ll do in this variation. To do so, we’ll have to be able to include information in the binary file about the tree and its structure.

In the original version we had three programs: MakeCode, Encode and Decode. For this version there are two main programs: Encode2 and Decode2.

In all, you will have to include the following three new methods in your class along with the other methods we had in HuffmanTree:

Method	Description
HuffmanTree2(BitInputStream input)	Constructs a Huffman tree from the given input stream. Assumes that the standard bit representation has been used for the tree.
void assign(String[] codes)	Assigns codes for each character of the tree. Assumes the array has null values before the method is called. Fills in a String for each character in the tree indicating its code.
void writeHeader(BitOutputStream output)	Writes the current tree to the output stream using the standard bit representation.

In the original HuffmanTree we had a method called write that would write the codes to an output file. Here the Encode2 program does the actual encoding. It first reads the file and computes the frequencies. Then it calls your constructor to create an appropriate HuffmanTree. It has to have some way to find out what codes your tree has come up with so that it can encode the characters of the file. It does so by calling the assign method in your class passing it an array of Strings that are all null. Your method will replace the null’s with codes for the characters included in the tree.

The Encode2 program also calls the method writeHeader in your class. The idea is to write to the bit stream a representation of the tree that can be used to reconstruct it later. As we did with the QuestionTree in assignment 7, we can print the tree using a preorder traversal. For a branch node, we write a 0 indicating that it is a branch. We don’t need to write anything more, because the branch nodes contain no data. For a leaf node, we will write a 1. Then we need to write the ASCII value of the character stored at this leaf. There are many ways to do this. We basically need to write some bits that can be read later to reconstruct the character value. The value will require up to 9 bits to write (it would be 8 if it weren’t for our pseudo-eof character).

We need to decide on a convention for writing an integer in 9 bits that we can reverse later when we read it back in. Below are the two methods I would like you to use to do this. They are inverses of each other in that read9 will recreate what write9 writes to the stream:

// pre : 0 <= n < 512

// post: writes a 9-bit representation of n to the given output stream

private void write9(BitOutputStream output, int n) {

for (int i = 0; i < 9; i++) {

output.writeBit(n % 2);

n /= 2;

}

// pre : an integer n has been encoded using write9 or its equivalent

// post: reads 9 bits to reconstruct the original integer

private int read9(BitInputStream input) {

int multiplier = 1;

int sum = 0;

for (int i = 0; i < 9; i++) {

sum += multiplier * input.readBit();

multiplier *= 2;

}

return sum;

}

Encode2 produces a binary file that first has a header with information about the tree and then has the individual codes for the characters of the file. The Decode2 program has to use this information to reconstruct the original file. It begins by calling the constructor listed in the table above, asking your class to read the header information and reconstruct the tree. Once the tree has been reconstructed, the program calls your decode method from the original assignment to reproduce the original file.

A collection of files necessary for this part of the assignment will be on the handouts page called ass8-bonus.zip. It will include Encode2.java, Decode2.java and examples of encoded input files called short.bonus and hamlet.bonus. Encode2 should produce exactly the same output when used with your version of HuffmanTree2. There will be a separate turn-in for the bonus in which you submit HuffmanNode.java and HuffmanTree2.java (you shouldn’t need to make any changes to your node class for the bonus, but it will be easier for us to grade if you submit both when you turn it in).