Final Exam for CSE 378

Snyder/Wolman 16 Dec '97

 

Instructions: Answer all questions fully. Show your work, it is generally more important than the actual answer.

 

1. [3] What do the following acronyms stand for?

(a) MIPS (b) TLB (c) CPI

 

2. [6] A computer user noticed that his computer that is rated at 100 MIPS takes 4 seconds to sort 10M integers using heap sort, while his friend's computer rated at 80 MIPS takes 3 seconds for the same task using the same C program. Give two reasons why this might be true. Be clear and concise!

(a) (b)

 

3. [2] With a 100 MHz clock, how long does a program with a 2.31 CPI take to run if it has 200M instructions? Give your answer in seconds.

 

4. [10] The following diagram represents an 8 index, 2-way set associative cache with 1 word blocks. Label the cache indices, and using an LRU replacement policy show the tag entries in binary or hex for the items in the cache after the reference sequence:

64, 256, 68, 260, 64, 128, 256

 

 

5. [15] Following an exception, the operating system has saved the exception PC, and registers 1-31 in that order in 32 words beginning at location Z and following in increasing memory. Register $t0 contains the address Z. If the instruction was a load

(lw rt, address(rs)), with the MIPS instruction format

Opcode[31:26], rs[25:21], rt[20:16], address[15:0],

write a MIPS instruction sequence to figure out the effective address of the memory word that the load was referencing, and leave that address in $v0. (ALTERNATIVE: DESCRIBE IN TEXT.)

 

6. [10] (a) Suppose a program executes 1 million instructions with 1.4 memory references per instruction, on average. Disregarding memory stall cycles, the program has a CPI of 2.0. If our cache has a 10% miss rate, and a cache miss costs 12 cycles, what is the actual CPI?

 

(b) Suppose the cache is enlarged to reduce miss rate for the same program to 5%. The larger cache means increasing the cycle time from 10 to 14 ns. The longer cycle means reducing the miss penalty from 12 to 8 cycles. How will the program perform with this new organization, stated as a % faster or slower than the previous formulation. Please show your work.

 

Use the following MIPS assembly code for the next two questions:

addi $6, 63 # 1

and $8, $6, $9 # 2

sub $12, $7, $10 # 3

lw $11, 4($12) # 4

add $13, $11, $12 # 5

7. [5] List the "read after write" data dependences in the preceding code. Recall, a data depencence exists between two instructions X and Y if X occurs before Y AND Y reads a register written by X. Use the notation X -> Y to indicate a dependence between instruction X and Y.

 

8. [18] Given a 5 stage pipelined machine that implements stalling, answer the following questions. (Assume that results from loads are available 2 cycles later and results from ALU operations are available 3 cycles later.)

 

a) Fill in the drawing showing the state of the pipeline during the 5th cycle of execution of the above instruction sequence. Indictate which instruction is in which stage, and show bubbles, if any, with an "x".

b) How many cycles would be needed to execute the instruction sequence? Show your work.

 

c) What is the average CPI for the above sequence of instructions?

 

Now suppose you implement forwarding, removing the delays for ALU operations, and reducing the load delay to one cycle. Please answer the following questions, based on this new organization.

 

d) Fill in the drawing showing the state of the pipeline during the 5th cycle of executing the above instruction sequence. Indictate which instruction is in which stage, indicating bubbles with an "x".

 

e) How many cycles would be needed to execute the above instruction sequence? Show your work.

 

f) What is the average CPI for the above sequence of instructions?

 

g) How much faster is the machine with forwarding than the machine with stalling?

 

h) Suppose you are the compiler/assembler. Can you reorder the instruction sequence to improve performance for the machine that implements forwarding? If yes, give the new sequence; if no, explain why not.

 

9. [7] Consider a 1024 entry, direct mapped, write through cache with blocks of size 16B.

(a) Show on the diagram how the address word is partitioned into its three constituent parts, and label each part.

 

31 29 27 25 23 21 19 17 15 13 11 9 7 5 3 1

+----------------------------------------------------------------+

| |

+----------------------------------------------------------------+

30 28 26 24 22 20 18 16 14 12 10 8 6 4 2 0

 

(b) How many total bits of memory will be required to implement this cache (include the memory contents of the cache)?

 

10. [10] Given Memory:
Address Contents Page Table Address: 0000e0a8
000000ac
000000b0
000000b4
. . .
000080ac
000080b0
000080b4
. . .
0000e0ac 80000000
0000e0b0 8000000e
0000e0b4 8000a0b4

Assuming 4K pages and "big-endian" addressing, i.e. the 0 byte of a word is the msb end, what are the contents of the byte memory location at the virtual address 000020b7?

 

11. [20] Explain the following concepts

(a) DMA (b) Memory Mapped I/O (c) Write Buffer (d) Interleaved memory (e) Load delay slot (f) NaN (g) MIPS Pseudoindirect addressing (h) CISC (i) "Three C's" (j) Branch Prediction Buffer

 

12 [20] Write in MIPS assembly language a recursive procedure to compute X!, where X! = X*(X-1)! Observe full calling conventions and include all assembler directives. (ALTERNATIVE: EXPLAIN IN TEXT.)

 

13. [4] EXTRA CREDIT. Explain clearly the distinction between a TLB Miss and a Page Fault?