### 548

#### Lecture 4 - CDC 6600









#### Why bother with executing out of order?

- Faster (hopefully)
- Execute operations sooner than when they came in program sequence
- An instruction fetched later may not be dependent
- Compiler order != critical path order
- Dynamic loop unrolling
- Bypassing long-latency operations, maximize parallelism
- Compilers not up to snuff

### Mem->Issue

- Have:
  - Instruction & its bits
- WaitFor:
  - Spot in the scoreboard
- Do
  - Do it! Move instruction to scoreboard

## Issue->Dispatch



- A functional unit that is free
- Wait for result register to be free
- Do
  - Assign instruction to functional unit

## Dispatch->Execute

- WaitFor
  - Operands to be ready
  - For the execute path to be free
- Do
  - Execute!

# Execute -> Complete

#### • WaitFor

- Execution to complete
- Result bus
- Wait for write-after-read conflicts to resolve

#### • Do

• Broadcast result + metadata

## Complete

- WaitFor
  - Register write to complete

- Do
  - Erase it
  - Free the functional unit resources

#### When is scoreboarding successful?

 When computing independent results written to different locations from a variety functional units

### Scoreboard Example

#### Scoreboard Example

| Instruction status          |                                       |                    |                      | Read     | Executi  | Write     |           |           |                  |           |           |
|-----------------------------|---------------------------------------|--------------------|----------------------|----------|----------|-----------|-----------|-----------|------------------|-----------|-----------|
| Instruction                 | j                                     | k                  | Issue                | operan   | c comple | Result    |           |           |                  |           |           |
| LD F6                       | 34+                                   | R2                 |                      |          |          |           |           |           |                  |           |           |
| LD F2                       | 45+                                   | R3                 |                      |          |          |           |           |           |                  |           |           |
| MULTI FO                    | F2                                    | F4                 |                      |          |          |           |           |           |                  |           |           |
| SUBD F8                     | F6                                    | F2                 |                      |          |          |           |           |           |                  |           |           |
| DIVD F10                    | FO                                    | F6                 |                      |          |          |           |           |           |                  |           |           |
| ADDD F6                     | F8                                    | F2                 |                      |          |          |           |           |           |                  |           |           |
| Functional u                | nit sta                               | itus               |                      |          | dest     | S1        | S2        | FU for j  | FU for k         | Fj?       | Fk?       |
| Time Name                   |                                       |                    |                      |          |          |           |           |           |                  |           |           |
| Time                        | Nam                                   | е                  | Busy                 | Ор       | Fi       | Fj        | Fk        | Qj        | Qk               | Rj        | Rk        |
| Time                        | Nam<br>Integ                          |                    | Busy<br>No           | Ор       | Fi       | Fj        | Fk        | Qj        | Qk               | Rj        | Rk        |
| Time                        |                                       | er                 |                      | Ор       | Fi       | Fj        | Fk        | Qj        | Qk               | Rj        | Rk        |
| Time                        | Integ                                 | ler<br>1           | No                   | Ор       | Fi       | <u>Fj</u> | <u>Fk</u> | Qj        | Qk               | <u>Rj</u> | Rk        |
| Time                        | Integ<br>Mult                         | ler<br>1           | No<br>No             | Op       | Fi       | <u>Fj</u> | <u>Fk</u> | Qj        | Qk               | <u>Rj</u> | Rk        |
| Time                        | Integ<br>Mult<br>Mult                 | er<br>1<br>2       | No<br>No<br>No       | Op       | Fi       | <u>Fj</u> | <u>Fk</u> | Qj        | Qk               | Rj        | Rk        |
| Time<br><u>Register res</u> | Integ<br>Mult<br>Mult<br>Add<br>Divid | ler<br>1<br>2<br>e | No<br>No<br>No<br>No | Op       | Fi       | <u>Fj</u> | Fk        | Qj        | Qk               | <u>Rj</u> | Rk        |
|                             | Integ<br>Mult<br>Mult<br>Add<br>Divid | ler<br>1<br>2<br>e | No<br>No<br>No<br>No | Op<br>F2 | Fi<br>F4 | Fj<br>F6  | Fk<br>F8  | Qj<br>F10 | <u>Qk</u><br>F12 | Rj        | Rk<br>F30 |













Revised from D. Patterson s1998







Revised from D. Patterson s1998



Revised from D. Patterson s1998





Revised from D. Patterson s1998

















| Instruction status         |    | Read  | Executi | c Write  | ;      |    |          |          |     |     |
|----------------------------|----|-------|---------|----------|--------|----|----------|----------|-----|-----|
| Instruction j k            |    | Issue | operand | d comple | t Resu | t  |          |          |     |     |
| LD F6 34+                  | R2 | 1     | 2       | 3        | 4      |    |          |          |     |     |
| LD F2 45+                  | R3 | 5     | 6       | 7        | 8      |    |          |          |     |     |
| MULTIFO F2                 | F4 | 6     | 9       | 19       | 20     |    |          |          |     |     |
| SUBD F8 F6                 | F2 | 7     | 9       | 11       | 12     |    |          |          |     |     |
| DIVD F10 F0                | F6 | 8     | 21      |          |        |    |          |          |     |     |
| ADDD F6 F8                 | F2 | 13    | 14      | 16       |        |    |          |          |     |     |
| Functional unit status     |    |       |         | dest     | S1     | S2 | FU for j | FU for k | Fj? | Fk? |
| Time Name                  |    | Busy  | Ор      | Fi       | Fj     | Fk | Qj       | Qk       | Rj  | Rk  |
| Integer                    |    | No    |         |          |        |    |          |          |     |     |
| Mult1                      |    | No    |         |          |        |    |          |          |     |     |
| Mult2                      |    | No    |         |          |        |    |          |          |     |     |
| Add                        |    | Yes   | Add     | F6       | F8     | F2 |          |          | Yes | Yes |
| Divide                     | e  | Yes   | Div     | F10      | FO     | F6 |          |          | Yes | Yes |
| <u>Register result sta</u> |    |       |         |          |        |    |          |          |     |     |
| Clock                      |    | FO    | F2      | F4       | F6     | F8 | F10      | F12      | ••• | F30 |
| 21                         | FU |       |         |          | Add    |    | Divide   |          |     |     |







### What are the "I/O" "processors"?

- FGMT processors on the 6600
- Uniform instruction set for I/O devices
- Why context switch main memory?
  - This is pre-virtual memory

#### What is the FIFO instruction stack?

- A very primitive instruction cache
- 8, 60-bit words, up to 32 instructions
- optimized for back-edges

# Your questions

- Precise exceptions?
- How much ILP/dataflow study before this?
- How much influence?
- When did memory get slower than computation?
- Major cycle?
- Why 10 control processors?
- What sort of performance improvement was had?
- Why is WAW not immediately resolvable?
- How much overhead is scoreboarding?
- Interrupts?
- Anyone still use this?
- Is 2% contention to be believed?
- Why only communicate via memory?
- Is this truly OoO?
- What about memory dependencies?
- Cost of the broadcast result busses?
- Relationship to Stetch? How to pipeline?
- Relationship between control processors and DMA?
- How much software actually ran on the CPs?
- How does scoreboarding work with timeslicing?
- Is there a place for physical registers in a scoreboarding machine?