



## Multiple Instruction Issue on Superscalars Requires more capability & more sophisticated design: • instruction fetch • fetch multiple instructions at once • prefetch speculatively beyond conditional branches • dynamic branch prediction • instruction issue • issue multiple instructions in parallel

- determine which instructions can be issued next
- · choose which of the ready instructions to issue
- execution
  - · execute multiple instructions at once
- · instruction commit
  - · commit several instructions in fetch order

Spring 2013

CSE 471 - Multiple Instruction Width

3







| -     | al code<br>lw R1, 0(R5)<br>addu R1, R1, R6<br>sw R1, 0(R5)<br>addi R5, R5, -4 | addu R1, R1, R6<br>sw R1, 4(R5) |                |             |
|-------|-------------------------------------------------------------------------------|---------------------------------|----------------|-------------|
|       | bne R5, R0, Loop<br>ALU/branch instructions                                   | memory                          | v instructions | clock cycle |
| Loop: |                                                                               |                                 |                | 1           |
|       |                                                                               |                                 |                | 2           |
|       |                                                                               |                                 |                | 3           |
|       |                                                                               |                                 |                | 4           |

## Code Scheduling on Superscalars: Loop Unrolling

|       | ALU/branch instruction        | Data transfer instruction  | clock cycle |
|-------|-------------------------------|----------------------------|-------------|
| Loop: | addi R5, R5, <mark>-16</mark> | lw R1, 0(R5)               | 1           |
|       |                               | lw R2, 12(R5)              | 2           |
|       | addu R1, R1, R6               | lw R3, <mark>8</mark> (R5) | 3           |
|       | addu R2, R2, R6               | lw R4, 4(R5)               | 4           |
|       | addu R3, R3, R6               | sw R1, 16(R5)              | 5           |
|       | addu R4, R4, R6               | sw R2, 12(R5)              | 6           |
|       |                               | sw R3, <mark>8</mark> (R5) | 7           |
|       | bne R5, R0, Loop              | sw R4, 4(R5)               | 8           |

What is the cycles per iteration? What is the IPC?

Spring 2013

CSE 471 - Multiple Instruction Width

8



