|
|
|
|
CSE 303 Homework 5
Due: Sunday, 2/10/08 11:59PM
Turnin: online - turnin instruction
FAQ
HW5 Q&A Wiki Page
Overview
This is intended to be a (very) short assignment providing practice at two
things: (1) program interface design, and (2) malloc/free. It consists
of two modifications to HW4 that would be very minor if you were
an experienced C programmer (and should be reasonably minor for you
at this point in the course).
The thrust of the modifications has to do with the fact that the
maximum number of distinct words the program can handle is a value
chosen by you, the programmer, and so is
set at program build time. If you were to ship this program,
you'd (most likely) ship the executable; the constant you chose
is built into the executable.
Since you don't know what system your program will eventually be run
on, you can't choose effectively; your choice is inevitably too big
or too small for the user's machine.
There are two options for addressing this (explained next).
Option/Part 1. Let The Invoker Decide
Accept a command line switch, of format "-n size", that lets the invoker
the invoker specify the size of the array to be allocated (i.e.,
the maximum number of distinct words that run can handle).
Note: Because the size of the array is not known until run time,
you cannot use #define as part of the solution -- it's executed at
compile time, before the value is known. Instead, you must wait until
run time, get the value for the array size (from somewhere -- that's what
you're designing), and malloc a sufficiently large array.
(This means that the static global variable histo isn't an array
any longer, it's just a pointer.)
Option/Part 2. Let The Memory Available at Runtime Decide
Build a mechanism that doesn't impose a fixed limit on the number
of distinct words, even at program invocation time, but instead
allows it to be (about) as big as all memory allows, if (and
only if) necessary.
This part is more constrained than the first part, in that you should
implement a scheme that follows this basic approach:
- You start with an array of some small, default size (possibly 0).
- When a call to addWord() occurs and the current array is full,
allocated a bigger array, copy the contents of the old one to the
new one,
free the old one, and then add the word to the new one.
With this change addWord() should fail only if malloc() says it is
out of memory. It is important that your program not waste huge
amounts of memory - the amount it actually uses should be proportional
to the number of distinct words it has seen.
There are two ways to make the array bigger. Let S be the current
size of the (now full) array, and newS the size of the
replacement. Then I can do either of:
- Additive increase:
newS = S + C , for some
constant C > 0 .
- Multiplicative increase:
newS = K*S , for some
constant K > 1 .
As an example (but not
a very good one), additive increase might start with an array of
length 0, then allocate an array of length 1 when the first word was
added, then an array of length 2 when the second was added, etc.
Multiplicative might start with an array of length 0, then allocate
one of length 1 (this is a special case, since 0*K != 1 no matter
the the value of K is - more on that in a minute),
then lengths 2, then length 4, then length 8, etc. Each time a new
array is allocated, the contents of the old one are copied to the new
one, the old one is deleted, and the word to be added (the one that
caused a new array to be allocated) is added to the new one.
All of this happens inside addWord(), which is the focus of your changes.
(Other methods might have to be edited slightly to deal with the changes
you might make to the shared variables they use.)
Option/Part 2 FAQ
Q: Are we suposed to implement additive increase or multiplicative
increase?
A: Yes. (A little computer science "humor" there. But, also literally
the answer to the literal question - your program should satisfy
the boolean expression "additive increase or multiplicative increase".)
Q: Is there a reason to start with an array of length 0, for both
approaches?
A: Yes.
Q: Well, what is it?
A: It's a program design issue, having to do with how the very first
array is allocated. If it's allocated as a static variable, like this:
static WordEntry histo[HISTO_SIZE};
it cannot be free 'd. That means in addWord()
you'd need to be able to determine that you can't free the old array
the first time you've allocated a new one, but would have to free every
one after that. There's no elegant way to do this. You also can't do anything
remotely like this histo = (WordEntry*)malloc(...); , so there
are other programming issues: you'd need to use a different variable name for
the first array than all the others (or something equivalently ugly).
Intead, you'll want a (single) variable like this:
static WordEntry* histo = NULL;
Because it's static, it can be initialized, but only to a constant
(like NULL, which is defined in , I think). It cannot be
defined to the result of a malloc() call.
You could insist that the user of your implementation (i.e.,
main.c) call some (new) initialization method you design, before doing
anything else with your wordHisto, and in that initialization routine
do a malloc() . Or, you could just detect in
addWord() that histo is null, and deal with
initialization there.
Q: Have you told me every detail I need to implement this?
A: No. For example,
you'll need some (static) int's as well -- one, like in hw4, to
keep track of how full the current allocation is, and another (replacing the #define symbol for the array length in hw4) to tell you how much space the
currently allocated array has. This is just regular old programming with
arrays, and has nothing to do with C in particular, though.
Q: Any other C-specific things for me to keep in mind?
A: Yes. Remember that malloc 's argument is in units of bytes. You don't need to know what those are, but you do need to use the
sizeof macro, which converts a type to the number of bytes
required to store one instance of that type. For instance, sizeof(int) is the number of bytes required to store an int, so to malloc a new int, you'd say
(int*)malloc(sizeof(int))
and to malloc an array of 10 ints you'd say
(int*)malloc(10*sizeof(int))
Q: I don't know how to use malloc/free. What should I do?
A: I'd start by writing tiny test programs, and making sure they
work as you expect. In a loop, for instance, use malloc to allocate an int,
then give it a value, then print it, then free it. (I'd do this in a loop
because problems with free often sometimes don't show up until you've done
another malloc.) With that working, I'd change to allocating an array of ints,
initalizing, printing (maybe), and freeing. At that point, malloc/free, to
the level needed in this program, should be easy.
Q: What else?
A: Design what you're going to do, on paper, before sitting down to do it.
Do that in some detail - enough for it to point out anything you're not sure
about how to implement in C. If you find there are things you're not sure
how to implement in C, try them out with simple test programs before
moving straight to the assignment.
Additive/Multiplicative Increase
Why isn't one strictly better than the other?
Suppose the final number of distinct words that will eventually be seen
is N. Additiive increasing the array size about N/C times, which
requires copying a total of C + 2C + ... + C(N/C) elements, which is about
N*N/C. At the same time, the worst case excess storage allocated is about
C (i.e., is independent of N).
Multiplicative increase requires move K + 2K + ... + K^(log(N)), which is
about K*N/(K-1) total copies. On the other hand, it can over-allocate up to
(K-1)*N elements, which is proportional to N. (A commonly used K is 2.)
For this assignment, my guess is that the distinctions don't matter
much in terms of running time or the size of the documents that can be
handled. And, in terms of the assignment, the distinctions definitely
don't matter - the code for the two is virtually identical.
Starter Files
You can start with any of your own hw4E-H solutions. Late on Wednesday
HW4E solution files will be provided that can be used as the starting
point for this assignment. (This delay is intentional. You should start
by making sure you understand what you're going to do, by thinking about it
and perhaps writing some tiny test statements, before trying to implement
the solution. Delaying the release of the standard starter files encourages
that thinking/testing. At the same time, if you don't think you need to
do that, you can always start with your own hw4 solution.)
Update: Starter files are now at attu:/cse/courses/cse303/08wi/hw5Dist.tar.gz
|