handout #26
CSE143—Computer Programming II
Programming Assignment #8 (part 2)
due: Tuesday, 8/16/05, 9 pm
(courtesy of Stuart Reges)
This
assignment is worth a total of 30 points.
This document explains part 2 of the assignment, which is worth
approximately half of the 30 points. Handout
25 has the description of part 1.
There are two new
main programs that are used in this part of the assignment. Recall that MakeCode.java
examined an input file and produced a code file for compressing it. The program Encode.java
uses this code and the original file to produce a binary file that is the
compressed version of the original. The
program Decode.java uses the code and the binary file
from Encode to reconstruct the original file.
Encode is a complete program that doesn’t need the Huffman tree. Decode depends on your HuffmanTree
class to do most of the work. In
particular, you will have to add two new methods to your HuffmanTree
class:
Method |
Description |
HuffmanTree(Scanner input) |
Constructs a Huffman tree from the Scanner. Assumes the Scanner contains a tree description in standard format. |
void decode(BitInputStream input, PrintStream output, int eof) |
Reads bits from the given input stream and writes the corresponding characters to the output. Stops reading when it encounters a character with value equal to eof. This is a pseudo-eof character, so it should not be written to the output file. Assumes the input stream contains a legal encoding of characters for this tree’s Huffman code. |
The
first of these methods is an alternative constructor. In part 1 you wrote a constructor that took
an array of frequencies and constructed an appropriate tree given those
frequencies. In this part you are given
a Scanner that contains the file produced by your write method from part
1. In other words, the input for this
part is the output you produced in part 1.
You are using your own output to recreate the tree. For this second part, the frequencies are
irrelevant because the tree has already been constructed once, but you are
using the same node class as before. You
can set all of the frequencies to some standard value like 0 or -1 for this
part.
Remember
that the standard format was a series of pairs of lines where the first line
has an integer representing the character’s ASCII value and the second line has
the code to use for that character. You
might be tempted to call nextInt() to read the
integer and nextLine() to read the code, but remember
that mixing token-based reading and line-based reading is not simple. Here is an alternative that uses a method
called parseInt in the Integer class that allows you
to use two successive calls on nextLine():
int n = Integer.parseInt(input.nextLine());
String code = input.nextLine();
For
the decoding part, you have to read a BitInputStream. This is a special class that Stuart wrote
that works in conjunction with another class called BitOutputStream. They each have a very simple interface. They allow you to write and read compact
sequences of bits.
The
only method you’ll use for BitInputStream is the
following method which returns the next bit from the file:
public int readBit()
Your
method is doing the reverse of the encoding process. It is reading sequences of bits that
represent encoded characters and it is figuring out what the original
characters must have been. Your method
should start at the top of your tree and should read bits from the input
stream, going left or right depending upon whether you get a 0 or 1 from the
stream. When you hit a leaf node, you know
you’ve found the end of an encoded sequence.
At that point, you should write the integer code for that character to
the output file. In doing so, call this
method from the PrintStream class:
public void write(int b)
You
don’t need to cast to char. You just
write the integer for this particular character (the value between 0 and 255
that is stored in this leaf). Once
you’ve written this character’s integer, you go back to the top of your tree
and start over, reading more bits and descending the tree until you hit a leaf
again. At that point you write again, go
back to the top of the tree, read more bits and descend until you hit a leaf,
then write the leaf, go back to the top of the tree, and so on.
Recognizing
the end of the input will be tricky.
Remember that we introduced a pseudo-eof
character with a special value (256).
The Encode method will write exactly one occurrence of this character at
the end of the file. At some point your
decoding method will come across this eof
character. At that point it should stop
decoding. It should not write this
integer to the PrintStream because it isn’t actually
part of the original file. The eof value will be 256 for this particular program, but your
code isn’t supposed to depend on this specific value, which is why it is passed
to your decode method as the third parameter.
If
you fail to recognize the pseudo-eof character, you
might end up accidentally reading past the end of the bit stream. When that happens, the readBit
method returns a value of -1. So if you
see a value of -1 appearing, it’s because you’ve read too far in the bit
stream.
In terms of correctness, your class must provide
all of the functionality described above.
In terms of style, we will be grading on your use of comments, good
variable names, consistent indentation and good coding style to implement these
operations. Remember that you will lose
points if you declare variables as data fields that can instead be declared as
local variables. You should also avoid
extraneous cases (e.g., don’t make something into a special case if it doesn’t
have to be).
There is one turn-in for both parts of
this assignment. Recall that you have to
turn in HuffmanNode.java and HuffmanTree.java. A collection of files needed for this part of
the assignment is included on the web page as ass8-2.zip. You should include Encode.java,
Decode.java, BitInputStream.java
and BitOutputStream.java in the same directory as
your other program files. The zip file
includes encoded versions of the sample files called short.short
and hamlet.short.
Your Decode program should be able to take one of these files and the
corresponding code file to reconstruct the original file.
Extra Credit (5 points)
There
is an extra credit option for this assignment that is worth 5 points. There will be no partial credit.
If you do the extra credit option, you
are still required to complete the standard HuffmanTree
and to submit it along with your HuffmanNode. So if you work on this, do so only after you
have completed the standard assignment.
To keep things clear, for this part of the assignment you should create
a class called HuffmanTree2. You can
copy your HuffmanTree class and modify it
appropriately to get the initial version of this class.
The main goal of this variation is to
eliminate the code file. When you use a
utility like zip, you don’t expect it to produce two output files (a code file
and a binary file). You expect it to
produce one file. That’s what we’ll do
in this variation. To do so, we’ll have
to be able to include information in the binary file about the tree and its
structure.
In the original version we had three
programs: MakeCode, Encode and Decode. For this version there are two main programs:
Encode2 and Decode2.
In all, you will have to include the following three new
methods in your class along with the other methods we had in HuffmanTree:
Method |
Description |
HuffmanTree2(BitInputStream input) |
Constructs a Huffman tree from the given input stream. Assumes that the standard bit representation has been used for the tree. |
void assign(String[] codes) |
Assigns codes for each character of the tree. Assumes the array has null values before the method is called. Fills in a String for each character in the tree indicating its code. |
void writeHeader(BitOutputStream output) |
Writes the current tree to the output stream using the standard bit representation. |
In the original HuffmanTree
we had a method called write that would write the codes to an output file. Here the Encode2 program does the actual
encoding. It first reads the file and
computes the frequencies. Then it calls
your constructor to create an appropriate HuffmanTree. It has to have some way to find out what
codes your tree has come up with so that it can encode the characters of the
file. It does so by calling the assign
method in your class passing it an array of Strings that are all null. Your method will replace the null’s with
codes for the characters included in the tree.
The Encode2 program also calls the method
writeHeader in your class. The idea is to write to the bit stream a
representation of the tree that can be used to reconstruct it later. As we did with the QuestionTree
in assignment 7, we can print the tree using a preorder traversal. For a branch node, we write a 0 indicating
that it is a branch. We don’t need to
write anything more, because the branch nodes contain no data. For a leaf node, we will write a 1. Then we need to write the ASCII value of the
character stored at this leaf. There are
many ways to do this. We basically need
to write some bits that can be read later to reconstruct the character
value. The value will require up to 9
bits to write (it would be 8 if it weren’t for our pseudo-eof
character).
We need to decide on a convention for
writing an integer in 9 bits that we can reverse later when we read it back
in. Below are the two methods I would
like you to use to do this. They are
inverses of each other in that read9 will recreate what write9 writes to the
stream:
// pre : 0 <= n < 512
// post: writes a 9-bit representation of n to the
given output stream
private void write9(BitOutputStream
output, int n) {
for (int i = 0; i
< 9; i++) {
output.writeBit(n % 2);
n /=
2;
}
}
// pre : an integer n has been encoded using write9
or its equivalent
// post: reads 9 bits to reconstruct the original
integer
private int read9(BitInputStream input) {
int multiplier = 1;
int sum = 0;
for (int i = 0; i
< 9; i++) {
sum +=
multiplier * input.readBit();
multiplier *= 2;
}
return
sum;
}
Encode2 produces a binary file that first
has a header with information about the tree and then has the individual codes
for the characters of the file. The
Decode2 program has to use this information to reconstruct the original
file. It begins by calling the
constructor listed in the table above, asking your class to read the header
information and reconstruct the tree.
Once the tree has been reconstructed, the program calls your decode
method from the original assignment to reproduce the original file.
A collection of files necessary for this
part of the assignment will be on the handouts page called ass8-bonus.zip. It will include Encode2.java, Decode2.java
and examples of encoded input files called short.bonus
and hamlet.bonus.
Encode2 should produce exactly the same output when used with your
version of HuffmanTree2. There will be a
separate turn-in for the bonus in which you submit HuffmanNode.java
and HuffmanTree2.java (you shouldn’t need to make any changes to your node
class for the bonus, but it will be easier for us to grade if you submit both
when you turn it in).