















| Original code |                                     | With load-latency-hiding code |                                                 |             |
|---------------|-------------------------------------|-------------------------------|-------------------------------------------------|-------------|
| Loop:         | lw <mark>R1</mark> , 0(R5)          |                               |                                                 |             |
|               | addu R1, <mark>R1</mark> , R6       |                               | addi R5, R5, -4                                 |             |
|               | sw R1, 0(R5)                        |                               | addu R1, R1, R                                  | 6           |
|               | addi R5, R5, -4<br>bne R5, R0, Loop |                               | sw R1, <mark>4(</mark> R5)<br>bne R5, \$0, Loop |             |
|               | ALU/branch instructions             | memor                         | y instructions                                  | clock cycle |
| Loop:         |                                     |                               |                                                 | 1           |
|               |                                     |                               |                                                 | 2           |
|               |                                     |                               |                                                 | 3           |
|               |                                     |                               |                                                 | 4           |

| 1 |
|---|
| 2 |
| 3 |
| 4 |
| 5 |
| 6 |
| 7 |
| 8 |
|   |



|                                                                                                                                                                      | <u>Superscalars</u>                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  |    |
|----------------------------------------------------------------------------------------------------------------------------------------------------------------------|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|----|
| <ul> <li>n</li> <li>n</li> <li>n</li> <li>n</li> <li>n</li> <li>n</li> <li>w</li> <li>n</li> <li>or else the</li> <li>designed</li> <li>There are reduced</li> </ul> | re impact:<br>nore & pipelined functional units<br>nulti-ported registers for multiple register access<br>nore buses from the register file to the additional functional units<br>nultiple decoders<br>nore hazard detection logic<br>nore bypass logic<br>vider instruction fetch<br>nulti-banked L1 data cache<br>he processor has structural hazards (due to an unbalanced<br>gn) and stalling<br>re restrictions on instruction types that can be issued together to<br>ce the amount of hardware.<br>ompiler) scheduling helps. |    |
| Spring 2010                                                                                                                                                          | CSE 471 - Multiple Instruction Width                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                 | 12 |