











| Original code<br>Loop: lw R1, 0(R5)<br>addu R1, R1, R6<br>sw R1, 0(R5) |                                                                | Loop:    | ad-latency-hiding code<br>lw R1, 0(s1)<br>addi R5, R5, -4<br>addu R1, R1, R6 |   |
|------------------------------------------------------------------------|----------------------------------------------------------------|----------|------------------------------------------------------------------------------|---|
|                                                                        | addi R5, R5, -4<br>bne R5, R0, Loop<br>ALU/branch instructions |          | sw R1, 4 (R5)<br>bne R5, \$0,                                                |   |
| Loop :                                                                 |                                                                | incinory | instructions                                                                 | 1 |
|                                                                        |                                                                |          |                                                                              | 2 |
|                                                                        |                                                                |          |                                                                              | 3 |
|                                                                        |                                                                |          |                                                                              | 4 |

## Code Scheduling on Superscalars: Loop Unrolling

|       | ALU/branch instruction        | Data transfer instruction  | clock cycle |
|-------|-------------------------------|----------------------------|-------------|
| Loop: | addi R5, R5, <mark>-16</mark> | lw R1, 0(R5)               | 1           |
|       |                               | lw R2, 12(R5)              | 2           |
|       | addu R1, R1, R6               | lw R3, <mark>8</mark> (R5) | 3           |
|       | addu R2, R2, R6               | lw R4, 4(R5)               | 4           |
|       | addu R3, R3, R6               | sw R1, 16(R5)              | 5           |
|       | addu R4, R4, R6               | sw R2, 12(R5)              | 6           |
|       |                               | sw R3, <mark>8</mark> (R5) | 7           |
|       | bne R5, R0, Loop              | sw R4, 4(R5)               | 8           |

What is the cycles per iteration? What is the IPC?

Spring 2015

CSE 471: Multiple Instruction Width

8



