HW3, selected solutions (problem 3.3, parts c, d, and e): To clarify the construction of the pipeline, here are 5 types of instructions; here is what they do in the ALU1, MEM, and ALU2 stages: 1)Register-Memory (R-M) ALUop: ALUop Rdest, Rsrc, offset(Rmem) Calculates offset+Rmem in ALU1 Loads data in Mem Calculates Rdest ALUop (data loaded) in ALU2 2)Register-Register (R-R) ALUop: ALUop Rdest, Rsrc1, Rsrc2. Does nothing in ALU1 & MEM Calculates Rsrc ALUop Rdest in ALU2 3)Branch: Branch Rsrc1, Rsrc2, offset(Rmem) Calculates offset + Rmem in ALU1 Does nothing in Mem Compares Rsrc1 and Rsrc2 in ALU2 4)Load Rdest, offset(Rmem) Calculates offset + Rmem in ALU1 Loads in Rmem Does nothing in ALU2 5)Store Rdest, Rsrc, offset(Rmem) Calculates offset + Rmem in ALU1 Stores Rsrc into memory Does nothing in ALU2 We will use the following diagram to describe where data forwarding should occur. TIME (in cycles) 1 2 3 4 5 6 7 8 9 10 A)IF RF ALU1 MEM ALU2 WB B) IF RF ALU1 MEM ALU2 WB C) IF RF ALU1 MEM ALU2 WB D) IF RF ALU1 MEM ALU2 WB E) IF RF ALU1 MEM ALU2 WB Data forwarding should occur when a data value has been computed, has not yet been written back to the register, and is needed by some instruction that is later in the pipeline. There are only 2 places in the pipeline that compute values which may be needed by a later instruction. 1) Right after the MEM stage: load instructions may produce a value needed by future instructions. 2) Right after ALU2: All ALUops (both R-R and R-M ALUops) may produce a value needed later There seems to be more than one data forwarding solution: consider the case when instruction A (in the diagram) is a load into R5, and instruction B is a Reg-Reg ALUop using R5. The value is ready after cycle 4, but it is not needed until cycle 6. There are 2 places you could forward the value, from MEM/ALU2 in instruction A to ALU1/MEM in instruction B, or from ALU2/WB in instr A to MEM/ALU2 in instr B. For the purposes of this solution, I will often forward the data as soon as it becomes available, even if the receiving instruction doesn't use the data right away. However, suppose a instr A is a load, and instr D uses the value of the load. A's WB stage is after D's RF stage, thus, A needs to forward the loaded value to D. However, since RF is the instruction decode stage, we can't know that we need to forward the data until after time 5; this is what we will do. Note also that instr A will never need to forward to instr E since E fetches its registers after A writes to them. Source Source Type Dest Dest Type Reason MEM/ALU2 load ALU1/MEM store need to store register MEM/ALU2 load ALU1/MEM ALUop, branch need register in ALU2 MEM/ALU2 load RF/ALU1 all but R-R ALUop need register in ALU1 MEM/ALU2 load RF/ALU1 store need to store register MEM/ALU2 load RF/ALU1 ALUop, branch need register in ALU2 ALU2/WB load RF/ALU1 all but R-R ALUop need register in ALU1 ALU2/WB load RF/ALU1 store need to store register ALU2/WB load RF/ALU1 ALUop, branch need register in ALU2 ALU2/WB ALUop MEM/ALU2 ALUop, branch need register in ALU2 ALU2/WB ALUop ALU1/MEM store need to store register ALU2/WB ALUop ALU1/MEM ALUop, branch need register in ALU2 ALU2/WB ALUop RF/ALU1 all but R-R ALUop need register in ALU1 ALU2/WB ALUop RF/ALU1 store need to store register ALU2/WB ALUop RF/ALU1 ALUop, branch need register in ALU2 Compact table without reason column: Source Source Type Dest Dest Type MEM/ALU2 load ALU1/MEM store, ALUop, branch MEM/ALU2 load RF/ALU1 store, ALUop, branch ALU2/WB load, ALUop RF/ALU1 all types (load, store, branch, ALU) ALU2/WB ALUop MEM/ALU2 ALUop, branch ALU2/WB ALUop ALU1/MEM store, ALUop, branch Now let's look at where the stalls occur. Basically, stalls occur when a result is needed but not yet available because it hasn't been computed yet. If the first instruction is a load, then there will be a one cycle stall with the next instruction if it needs the loaded register in it's ALU1 stage. If the first instruction is an ALUop, then there 2 places where the result of the ALUop may be needed in the next instruction. If the result is needed in MEM (for a store), there will be a one cycle stall. If the result is needed in ALU1 (loads, stores, branches, or ALUops), then there will be a 2 cycle stall. First instr(in RF/ALU1) Next instr(in IF/RF) length of stall Load load, store, branch, ALUop 1 ALUop store 1 ALUop load, store, branch, ALUop 2