CSE378 Exam

. (2 pts. per part)

Referring to figure 5.17 (textbook p.358), for a single-cycle MIPS implementation:

Locate the unit marked "shift left 2".

Give an example of an instruction whose correct processing relies on that unit.

Give an example of an instruction whose processing does not depend on it.

For an instruction such as the one you gave just above, what prevents the unit from carrying out the shift anyway?

a branch or jump

anything except a branch or jump

Nothing! [The shift takes place, but the new address which is ultimately generated never makes it to the PC.]

.
In the single-cycle implementation, an instruction is supposed to be fully processed within just one clock cycle. Looking at the schematic (fig. 5.17), we can see that signals have to propagate through a number of components before the final result is achieved. Let's focus on the Data Memory. As the cycle begins, that unit has some signals already present on its inputs.	(1 pt. per) List all the inputs of the Data Memory and state briefly what their purpose is. One of the Data Memory's input comes from the ALU. As the cycle begins, the ALU will not yet have computed a new value. What prevents the Data Memory from immediately using the previous, possibly incorrect value, thus producing incorrect results at its output?
[there are four inputs to list, including the two control signals.] Timing -- of the Data Memory's own operation -- since this is single-cycle, it is not correct to say that the control unit delays setting the read or write control value until the correct inputs are present. All the control signals are set at the beginning of the cycle and never changed. The whole circuit must be timed so that the Data Memory does not perform any write operations until it is safe to do so.

.
Figure 5.33 of the textbook (p. 383) shows the datapath and control lines of a multi-cycle MIPS implementation.	(1/2 pt. ea., 7 points total) Writing directly on that sheet of the handout, show the required value for each control line during the Fetch cycle. If any value is a "don't care", indicate it as X. ("Each control line" means each of the lines coming out of the oval marked Control). (3 pts.) Comparing the single-cycle (fig. 5.17) with the multi-cycle (fig. 5.33), the latter has an Instruction Register on its data-path, not present on in the single-cycle version. Explain this addition (i.e., why is it needed now and wasn't before?)
[There are 13 signals. This was scored as .5 points per signal, rather than 1 pt. each as originally indicated, and the total was rounded up. The biggest mistake was forgetting that the PC is incremented during the fetch cycle, which involves about 5 of the control signals. The second biggest mistake was setting RegWrite to X and not to 0. This is necessary because if RegWrite were X, it could sometimes be at 1 and potentially write bogus data to the register file, corrupting registers that we could use later. That is, it changes architectural state, and this should never be a don't care condition. It wouldn't break the particular fetch cycle of current instruction, but it could break subsequent instructions and correctness of the whole program.] The reason? In the single-cycle version, there were two memories, one for instructions, and one for data, and the PC was not incremented until the end of the cycle. In the multi-cycle case, there is only one memory, which is used for data as well as instruction fetch. Thus, the fetched instruction needs to be saved since the memory is going to be reused. It is not enough to say "the instruction is needed even after the fetch stage in multi-cycle". This is simply saying that the data in the instruction word is needed for the entire length of instruction execution, which is also true for the single-cycle case. That doesn't explain why there isn't also an IR for the single-cycle design.

. (2 pts. ea.)

On a handout is an unlabled finite-state diagram for a multi-cycle implementation of some machine.

For each of the following, answer as best you can; if no answer is possible, explain what other information would be needed in order to give a good answer.

What is the lower bound for the CPI for this machine (i.e., the theoretical smallest value)?

What is the upper bound for the CPI for this machine (i.e. the theoretical largest value)?

What is the average CPI for this machine?

shortest: 3.

You can get this by following the shortest path on the diagram using the top path from third state!) Many people missed the path on the top and ended up with a CPI of 6 here following the next shortest route.

7, following the longest path.

Cannot determine the average path without knowing a mix of instructions - we need to know % of instructions following each possible path on the diagram.

. (2 pts. ea.)

On the handout there is a very high-level view of a 5-stage MIPS pipeline.

Explain the vertical bar named MEM/WB.

What exceptions, if any, can be generated during the MEM stage?

What exceptions, if any, can be generated during the WB stage?

In certain instructions, a value is produced during the ID stage that is not needed again until the WB stage. Give an example of an instruction which has this characteristic.

For an instruction like the one just mentioned. how does the value produced during ID make it to the WB stage?

MEM/WB: It's a set of pipeline registers storing all the necessary information coming from the memory stage to be used to execute the WB part of the instruction. This includes memory output, ALU output, destination register, and all the control signals. You didn't have to list all the contents, but you also couldn't just list one of these and say that this is what's the pipeline register is used for.

MEM exceptions: page fault, illegal address

WB exceptions: none

The value meant here was the destination register index, determined in ID (by selecting from rt or rd) and passed through to WB stage, where it is fed to the register file to decide which register to write to. Note that for this to work, you need an instruction that actually writes to a register! That means add, lw, sll, etc. are all fine. For some reason, many people put down sw. This is wrong, because sw writes to memory and not to register file! Thus, it never uses the destination register field and doesn't work as an example here. Other example that don't work include branches and jumps, for the same reason.

It gets passed along in the pipeline registers in the sequence ID/EX -> EX/MEM -> MEM/WB

.

Consider the following sequence of instructions:

add $s0, $t0, $t2
sub $t2, $s0, $t3

Assume the MIPS five-stage pipeline.

(2 pts.) Identify the hazards, if any.

(3 pts.) If there are hazards, explain how they are handled. Draw a picture if it helps. (The implication is that you should handle hazards as efficiently as possible. For example, "wait until one instruction clears the pipeline before issuing the next one" would be effective in eliminating hazards but be unacceptable).

There is a data hazard, the sub instruction uses register $s0 that the add instruction writes.

Forward the ALU value held in the EX/MEM pipeline register to the ALU input in the EX stage.

. (3 pts.)
Assuming the MIPs five-stage pipeline: give a sequence of two consecutive instructions which have a data hazard which cannot be resolved by forwarding alone.
lw $t0, 0($sp) addi $t1, $t0, 1 ADDENDUM: a sw instruction following a lw has a data hazard which can be resolved by forwarding from the MEM/WB pipeline register to the MEM stage.

. 6 pts.

You work for a company that makes MIPS-compatibile CPU chips. A new type of on-chip memory technology has become available, which could serve for the Data/Instruction memory of your CPU. It operates 20% faster than the memory currently used.

Determine the potential effect on clock cycle time for each of the two designs listed below. If you cannot calculate the impact directly, explain what the effect would depend on and what you would need to know to calculate it exactly.

single-cycle design:

The cycle time is determined by the time taken for signals to propagate through the longest path in the circuit. This path would be for the load instruction, which accesses memory twice; once to fetch the load instruction from memory (this is done for all instructions) and a second time to actually load the value from memory to put into a register. Both of these accesses will take 20% less time but the reduction in cycle time is less than 40% because the rest of the circuit still requires time for signals to propagate through. The faster memory may make the load instruction not be the longest path anymore, in which case the cycle time would depend on the new longest path. You would need to know the speeds of the other functional units to calculate the cycle time exactly.

pipelined design:

In a pipeline all stages take the same time (one clock cycle), which is the time taken for the slowest operation performed in a stage. If this operation involves accessing memory (i.e. IF or MEM stages), then a 20% faster memory can result in a 20% shorter cycle time, unless there is now a slower operation, in which case the cycle time improvement would be less than 20% (to calculate the exact value would require knowing the time taken for the new slowest operation).

First name: _ Last name: Section: _

CSE378 Autumn Quarter
University of Washington
Midterm #2
Friday, November 14, 2003

Closed book, closed notes; no calculators

First name: _____________ Last name: ______________ Section: ___

CSE378 Autumn Quarter University of Washington Midterm #2 Friday, November 14, 2003

Closed book, closed notes; no calculators

First name: _ Last name: Section: _

CSE378 Autumn Quarter
University of Washington
Midterm #2
Friday, November 14, 2003