|
CSE Home | CSE 378 Fall 2006 | About Us | Search | Contact Info |
|
Lab 3: Pipelining (Wikified Version)
Assigned: 10/30/2006 DescriptionThe processor constructed in the previous labs implements a significant part of the MIPS instruction set, but does so very slowly. The problem is that the clock speed is constrained by the longest path a signal can take in a single clock cycle. For the first two labs this path started at the register file, went through the ALU, the memory, and finally back into the register file. In this lab, we'll be shrinking the longest path significantly by breaking the datapath into 5 separate pieces or stages. Adding registers between stages will mean that the longest path a signal must traverse in a single cycle will decrease dramatically. This speed-up will come at a cost, because it will now take 5 cycles for an instruction to complete. To offset this slowdown we'll be able to run independent instructions in each stage, for a maximum of 5 instructions "in-flight", leading to a maximum throughput of 1 instruction per cycle. Background: (COD: 3e pp 370-374, 384-402) Phase 0: AdministrationThis lab will take place in the same workspace as the previous labs. The files for this lab are provided as an archived design. You will need to restore the design, add it to the workspace, and copy several design files from lab2 into the new design. Download the archived design lab3.zip and follow these steps:
At this point the files are all available. Now we need to tell Active-HDL about the new design.
Phase 1: Partitioning and Pipeline RegistersThe fundamental idea behind pipelining is to separate the datapath into individual stages. Each stage will take one clock cycle to complete, and can contain one instruction. This phase will describe the elements of the stages and the pipeline registers that function as the barrier between stages. A pipeline register saves the values from the previous stage so that each stage can perform an independent instruction. In addition, the pipeline registers must have a reset signal so that their initial state is known, and a load enable signal. The load enable will be used later to ensure that certain registers do not update their values in special cases. WARNING: The test fixtures are very name-sensitive. Double-check all wire and component names.Part A. Organizing Stage LogicThe following stage descriptions explain the components that should be in each stage. Please scan the descriptions before making any changes, and be sure to refer back as things progress.
Part B: Instruction Decode (ID) StageThis part provides a strategy for organizing the ID stage and completing the pipeline register that ends the stage. The file piperegisters.v contains an incomplete IDEXReg module. However, before completing the register there are others issues to address.
At this point its time to address the IDEXReg component defined in piperegisters.v. Your task is to define the input and output ports for the additional data needed in the EX stage and the control signals needed in the EX, MEM, and WB stages. As a starting point, ID_RegWrite is an input control signal and EX_RegWrite is the output of the same signal.
Part C: Execute (EX) StageThis part provides a strategy for organizing the EX stage and completing the pipeline register that ends the stage. The file piperegisters.v contains an incomplete EXMEMReg module. However, before completing the register there are others issues to address.
Your task is to define the input and output ports for the control signals that get passed through the EX stage for use in the MEM and WB stages.
Part D: Memory (MEM) StageThe memory stage provides the address and data for writing to memory or reading from memory. It sends the value read from memory and the address (ALUOut) on to the WB stage. The file piperegisters.v contains an incomplete MEMWBReg module. First there are some bookkeeping tasks.
Once these things are complete its time to finalize the MEMWBReg from piperegisters.v. You need to add ports and logic for the control signals that get passed through the EX stage for use in the WB stage.
Part E: Write-Back (WB) StageThis stage basically just separates the memory access time from the update to the register file. At this point, all the pipeline registers are in place, and the last task is to rename some signals to ensure that the proper data is being used.
Once these things are complete its time to finalize the MEMWBReg from piperegisters.v. You need to add ports and logic for the control signals that get passed through the EX stage for use in the WB stage.
Part F: TestingThe cpu.bde has a fairly limited number of input and output ports. To make the test fixtures more effective there is a file cpu_wrapper.v that "peeks" inside the CPU and exposes a wide range of signals for the test fixtures. Test the updated cpu.bde with test fixture phase1_tf.v. This test fixture runs through the all of the non-control instructions, and verifies that everything is connected properly. Problems with branch or jump instructions will be addressed in the test fixture for Phase 2. Phase 2: Branching and Delay SlotsThe processor from Lab 2 decided the next PC value in every cycle based on the current instruction. Each instruction took a single cycle, so branch instructions would know the resulting PC before the next clock cycle. After pipelining, the control signals are not available until the cycle after an instruction is fetched. This causes a control hazard for branch instructions because we do not know if the branch occurs until the following instruction has been fetched. There are different ways to deal with this problem, and the MIPS designers decided to turn the hazard into a feature by defining a delay slot. The delay slot is the instruction directly after a branch or jump, and is always executed, regardless of branch outcome. This means that no instructions are squashed on branches, because the branch result is known for the instruction after the delay slot. (See COD:3e pp 423-424 for more on delay slots) Part A. Branch ComparisonsIn the preceding labs, branch comparisons were performed by forcing a subtract in the ALU. For single-cycle machines this was an efficient reuse of logic. In a pipelined machine we want to make the branch decision in the ID stage, so we need the result of the comparison in the ID stage.
Part B. PC Address ComputationIn labs 1 and 2 the branch address was computed by adding the offset to the address of the next instruction. The simplest way to do this computation was to add the branch offset to PC+4. That would ensure that the proper instruction was targeted. Now, the branch decision is delayed one cycle, meaning that PC+4 is actually two instructions after the branch. There are a number of ways to compute the correct branch target address. The deciding factor is the complexity of the logic required. The minimal solution is to pass both PC and PC+4 (IF_NextPC) to the pcaddresscomputer. Branch and jump address computations use PC, while PC+4 is used as the default, or for branches that aren't taken.
Part C. Delay Slots and Return Addresses The The addition of the delay slot changes this
behavior, because the instruction directly after the PC + 8 = (PC + 4) + 4 = NextPC + 4
Part C. TestingTest the updated cpu.bde with test fixture phase2_tf.v. This test fixture is designed to exercise the branch and jump instructions to ensure that the correct comparisons were made and that the return addresses are being computed properly. TurninTo turn in your lab, complete all phases and use the Design -> Archive Design command in ActiveHDL on the final product to produce a .zip file containing all the essential files. Put this somewhere accessible via attu and use "turnin -c cse378 'your file'" to submit it for grading. |
|||||||||||||||||||||||||||||||||||||||||||
Computer Science & Engineering University of Washington Box 352350 Seattle, WA 98195-2350 (206) 543-1695 voice, (206) 543-2969 FAX [comments to Course Staff] |