Assigned: Monday, March 28, 2011
Due: Wednesday, March 30, 2011 at 11:59 PM
Turnin: Online
Let's test the claim that understanding the memory hierarchy can be useful in writing efficient programs. An example in the first-day lecture slides said that interchanging two loops has no effect on the correctness of the results, but can give a 21x difference in performance. Let's see about that.
Here's the important part of the code. It computes exactly the same thing no matter which of the two loops is outermost.
for ( i = 0; i < 2048; i++ )
{
for ( j = 0; j < 2048; j++ )
{
//src[i][j] = i * rep;
dst[i][j] = src[i][j];
}
}
You will download a set of four tiny programs—one in C and three in Java—that contain those loops. You'll compile them and time how long it takes them to run. For the C program, you'll compile both with and without compiler optimizations enabled, so in total you will have five programs to compare at a time (3 Java + C compiled two ways).
You will do this several times, making small modifications to see what differences they make—how the choice of language affects performance and how effective the compiler can be at optimizing your code when you:
i
and j
loopsYou'll run each version of the code and measure how long it takes to complete. With all the permutations (5 executables x 2 loop orderings x 2 commented/uncommented line versions x 2 array sizes), that's 40 versions. (It will be easy—just read all the way through these instructions first.)
You'll then turn in a short document, described below, in which you summarize your test results and answer a few questions.
The assignment requires use of a Linux system (or a Mac, if the right software is installed by default):
attu.cs.washington.edu
,
accessed via ssh
. Fetch the files, which are provided as a tar
archive: hw0.tar.gz .
Save them to a directory in which you want a new directory (containing the files) created.
Now issue the command 'tar xzf hw0.tar.gz
' (without the quotes). That will un-archive the files,
creating directory hw0
. In that directory you will find these files:
hw0.java Rows 'Java' in your tables of test results (see below) hw0Integer.java Rows 'JavaInteger' in your tables hw0IntegerInteger.java Rows 'JavaIntegerInteger' in your tables hw0.c Rows 'C' and 'Optimized C' in your tables run.pl See "Automating" below
To compile the C program without optimizations, cd
to the hw0
directory and type
'gcc -Wall hw0.c
'. That results in an executable named a.out
. To compile the program
with optimizations, type 'gcc -Wall -O2 hw0.c
', which also produces executable a.out
(overwriting it).
To run a.out
, you would type './a.out
'.
(Note: You don't actually want to do this. See the next heading about obtaining timings.)
To compile hw0.java
type 'javac -Xlint hw0.java
', which produces hw0.class
.
Do the same thing for the other Java programs.
To run it, type 'java -Xmx640M -cp . hw0
'. (Again, this is a command you need
to time, so read on.)
On Linux, you can measure the CPU time consumed by any execution using the time
program.
For example:
$ /usr/bin/time ./a.out 0.12user 0.03system 0:00.16elapsed 95%CPU (0avgtext+0avgdata 66704maxresident)k 0inputs+0outputs (0major+8287minor)pagefaults 0swaps
This executes the command (./a.out
) and then prints information about the resources it consumed.
(Type 'man time
' to obtain more information about the time program and ways to format its output.)
The only information we'll use is the user time ('0.12user', meaning 0.12 seconds of CPU time consumed while not in the operating system) and the system time ('0.03system', meaning 0.03 CPU seconds spent by the operating system doing things for this application). The measured time we want is the sum of those two. For this example, the measured time would be 0.15 seconds.
Measured times are likely to vary quite a bit from one run to the next, even without changing anything. (This course will explain some of the reasons why.) Note that all the programs wrap the two array-copying loops with another loop that causes the copy to be performed 10 times. One goal of that is to reduce the amount of variability in the measurements.
The distribution includes an optional script, run.pl
, that automates some of the chore of
running the five executables and gathering measurements.
To run it, type './run.pl
'. It compiles each of the source files (and hw0.c
twice;
with and without optimizations), runs each with the time
command, and reports the
sum of the user and sytem times.
run.pl
works everywhere I've tried it (including the CSE virtual machine). It should work for you,
but it is an optional (and unsupported) tool.
src
. Re-compile and re-measure.i
and j
loops. Recompile and re-measure.src
(with the i
and
j
loops still reversed). Re-compile and re-measure.Collect your results in a short document (PDF, text, or Word) with the following sections:
cat /proc/cpuinfo
'. Give us the model name, as listed.Array Size | Performingsrc assignment? |
App | i then j | j then i |
2048 | No | Java | measured time | etc. |
JavaInteger | ||||
JavaIntegerInteger | ||||
C | ||||
Optimized C |
Related to that, try compiling this C program, with and without optimization, and then timing running it:
#include <stdio.h>
#define SIZE 1000000
int main()
{
int i, j, k;
int sum = 1;
for (i = 0; i < SIZE; i++)
for (j = 0; j < SIZE; j++)
for (k = 0; k < SIZE; k++)
sum = -sum;
printf("hello, world\n");
return 0;
}
Now replace the printf
line with
printf("Sum is %d\n", sum);
and compile/run unoptimized and optimized. Please turn in a .pdf
, .txt
, or .doc
or .docx
file using
the Catalyst turn-in page for
this assignment (you can submit even if you're not yet officially enrolled.) The navigation sidebar
above also has a link to the main Catalyst 351 turnin page.
If you bring your results to section on Thursday, we can discuss them as a group and find out whether the class saw any interesting patterns.
Grading is basically binary. If you convince us that you actually did the assignment, on time, then full credit. If you fail to convince us, then no credit.
Note that there are two kinds of work required to "do the assignment"—the mechnical aspects of modifying and measuring the code, and the intellectual effort of thinking about the results.