CSE 303: Concepts and Tools for Software Development, Winter 2008
  CSE Home   About Us   Search   Contact Info 
 
Course Home
 Home
Administation
 Overview
 Course Wiki
 Email archive
 Anonymous feedback
 View feedback
 Homework Turnin
 
Most Everything
 Schedule
 
Other Information
 UW/ACM Tutorials
 303 Computing: Getting Started
   

CSE 303 Homework 5

Due: Sunday, 2/10/08 11:59PM
Turnin: online - turnin instruction

FAQ

  • HW5 Q&A Wiki Page
  • Overview

    This is intended to be a (very) short assignment providing practice at two things: (1) program interface design, and (2) malloc/free. It consists of two modifications to HW4 that would be very minor if you were an experienced C programmer (and should be reasonably minor for you at this point in the course).

    The thrust of the modifications has to do with the fact that the maximum number of distinct words the program can handle is a value chosen by you, the programmer, and so is set at program build time. If you were to ship this program, you'd (most likely) ship the executable; the constant you chose is built into the executable. Since you don't know what system your program will eventually be run on, you can't choose effectively; your choice is inevitably too big or too small for the user's machine.

    There are two options for addressing this (explained next).

    Option/Part 1. Let The Invoker Decide

    Accept a command line switch, of format "-n size", that lets the invoker the invoker specify the size of the array to be allocated (i.e., the maximum number of distinct words that run can handle).

    Note: Because the size of the array is not known until run time, you cannot use #define as part of the solution -- it's executed at compile time, before the value is known. Instead, you must wait until run time, get the value for the array size (from somewhere -- that's what you're designing), and malloc a sufficiently large array. (This means that the static global variable histo isn't an array any longer, it's just a pointer.)

    Option/Part 2. Let The Memory Available at Runtime Decide

    Build a mechanism that doesn't impose a fixed limit on the number of distinct words, even at program invocation time, but instead allows it to be (about) as big as all memory allows, if (and only if) necessary.

    This part is more constrained than the first part, in that you should implement a scheme that follows this basic approach:

    • You start with an array of some small, default size (possibly 0).
    • When a call to addWord() occurs and the current array is full, allocated a bigger array, copy the contents of the old one to the new one, free the old one, and then add the word to the new one.
    With this change addWord() should fail only if malloc() says it is out of memory. It is important that your program not waste huge amounts of memory - the amount it actually uses should be proportional to the number of distinct words it has seen.

    There are two ways to make the array bigger. Let S be the current size of the (now full) array, and newS the size of the replacement. Then I can do either of:

    • Additive increase: newS = S + C, for some constant C > 0.
    • Multiplicative increase: newS = K*S, for some constant K > 1.
    As an example (but not a very good one), additive increase might start with an array of length 0, then allocate an array of length 1 when the first word was added, then an array of length 2 when the second was added, etc. Multiplicative might start with an array of length 0, then allocate one of length 1 (this is a special case, since 0*K != 1 no matter the the value of K is - more on that in a minute), then lengths 2, then length 4, then length 8, etc. Each time a new array is allocated, the contents of the old one are copied to the new one, the old one is deleted, and the word to be added (the one that caused a new array to be allocated) is added to the new one.

    All of this happens inside addWord(), which is the focus of your changes. (Other methods might have to be edited slightly to deal with the changes you might make to the shared variables they use.)

    Option/Part 2 FAQ

    Q: Are we suposed to implement additive increase or multiplicative increase? A: Yes. (A little computer science "humor" there. But, also literally the answer to the literal question - your program should satisfy the boolean expression "additive increase or multiplicative increase".)

    Q: Is there a reason to start with an array of length 0, for both approaches?
    A: Yes.

    Q: Well, what is it?
    A: It's a program design issue, having to do with how the very first array is allocated. If it's allocated as a static variable, like this:

    static WordEntry histo[HISTO_SIZE};
    it cannot be free'd. That means in addWord() you'd need to be able to determine that you can't free the old array the first time you've allocated a new one, but would have to free every one after that. There's no elegant way to do this. You also can't do anything remotely like this histo = (WordEntry*)malloc(...);, so there are other programming issues: you'd need to use a different variable name for the first array than all the others (or something equivalently ugly).

    Intead, you'll want a (single) variable like this:

    static WordEntry* histo = NULL;
    Because it's static, it can be initialized, but only to a constant (like NULL, which is defined in , I think). It cannot be defined to the result of a malloc() call.

    You could insist that the user of your implementation (i.e., main.c) call some (new) initialization method you design, before doing anything else with your wordHisto, and in that initialization routine do a malloc(). Or, you could just detect in addWord() that histo is null, and deal with initialization there.

    Q: Have you told me every detail I need to implement this?
    A: No. For example, you'll need some (static) int's as well -- one, like in hw4, to keep track of how full the current allocation is, and another (replacing the #define symbol for the array length in hw4) to tell you how much space the currently allocated array has. This is just regular old programming with arrays, and has nothing to do with C in particular, though.

    Q: Any other C-specific things for me to keep in mind?
    A: Yes. Remember that malloc's argument is in units of bytes. You don't need to know what those are, but you do need to use the sizeof macro, which converts a type to the number of bytes required to store one instance of that type. For instance, sizeof(int) is the number of bytes required to store an int, so to malloc a new int, you'd say

    (int*)malloc(sizeof(int))
    and to malloc an array of 10 ints you'd say
    (int*)malloc(10*sizeof(int))

    Q: I don't know how to use malloc/free. What should I do?
    A: I'd start by writing tiny test programs, and making sure they work as you expect. In a loop, for instance, use malloc to allocate an int, then give it a value, then print it, then free it. (I'd do this in a loop because problems with free often sometimes don't show up until you've done another malloc.) With that working, I'd change to allocating an array of ints, initalizing, printing (maybe), and freeing. At that point, malloc/free, to the level needed in this program, should be easy.

    Q: What else?
    A: Design what you're going to do, on paper, before sitting down to do it. Do that in some detail - enough for it to point out anything you're not sure about how to implement in C. If you find there are things you're not sure how to implement in C, try them out with simple test programs before moving straight to the assignment.

    Additive/Multiplicative Increase

    Why isn't one strictly better than the other?

    Suppose the final number of distinct words that will eventually be seen is N. Additiive increasing the array size about N/C times, which requires copying a total of C + 2C + ... + C(N/C) elements, which is about N*N/C. At the same time, the worst case excess storage allocated is about C (i.e., is independent of N).

    Multiplicative increase requires move K + 2K + ... + K^(log(N)), which is about K*N/(K-1) total copies. On the other hand, it can over-allocate up to (K-1)*N elements, which is proportional to N. (A commonly used K is 2.)

    For this assignment, my guess is that the distinctions don't matter much in terms of running time or the size of the documents that can be handled. And, in terms of the assignment, the distinctions definitely don't matter - the code for the two is virtually identical.

    Starter Files

    You can start with any of your own hw4E-H solutions. Late on Wednesday HW4E solution files will be provided that can be used as the starting point for this assignment. (This delay is intentional. You should start by making sure you understand what you're going to do, by thinking about it and perhaps writing some tiny test statements, before trying to implement the solution. Delaying the release of the standard starter files encourages that thinking/testing. At the same time, if you don't think you need to do that, you can always start with your own hw4 solution.)

    Update: Starter files are now at attu:/cse/courses/cse303/08wi/hw5Dist.tar.gz


    Computer Science & Engineering
    University of Washington
    Box 352350
    Seattle, WA  98195-2350
    (206) 543-1695 voice, (206) 543-2969 FAX
    [comments to zahorjan at cs.washington.edu]