







| Non-blocking Caches                    |                                                              |  |  |  |
|----------------------------------------|--------------------------------------------------------------|--|--|--|
| in-order processors                    |                                                              |  |  |  |
| lw <b>\$3,</b> 100(\$4)                | in execution, cache miss                                     |  |  |  |
| add \$2 <b>, <mark>\$3</mark>,</b> \$4 | consumer waits until the miss is satisfied                   |  |  |  |
| sub \$5, \$6, \$7                      | independent instruction waits for the add                    |  |  |  |
| out-of-order processors                |                                                              |  |  |  |
| lw <b>\$3,</b> 100(\$4)                | in execution, cache miss                                     |  |  |  |
| sub \$5, \$6, \$7                      | independent instruction can execute<br>during the cache miss |  |  |  |
| add \$2 <b>, <mark>\$3</mark>,</b> \$4 | consumer waits until the miss is satisfied                   |  |  |  |
|                                        |                                                              |  |  |  |
| Spring 2015                            | CSE 471: Advanced Caching 5<br>Techniques 5                  |  |  |  |



| Sub-block Placement                                                                                                                                                                                                                                                                                                                                                                                     |                                                                                  |   |  |  |  |
|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|----------------------------------------------------------------------------------|---|--|--|--|
| Divide a b                                                                                                                                                                                                                                                                                                                                                                                              | lock into sub-blocks                                                             |   |  |  |  |
| tag<br>tag<br>tag<br>tag                                                                                                                                                                                                                                                                                                                                                                                | IdataVdataVdataIdataIdataVdataVdataVdataVdataVdataVdataVdataIdataIdataIdataIdata |   |  |  |  |
| <ul> <li>sub-block = unit of transfer on a cache miss</li> <li>valid bit/sub-block</li> <li>2 kinds of misses: <ul> <li>block-level miss: tags didn't match</li> <li>sub-block-level miss: tags matched, valid bit was clear</li> </ul> </li> <li>the transfer time of a sub-block</li> <li>fewer tags than if each block was the size of a subblock</li> <li>can't exploit spatial locality</li> </ul> |                                                                                  |   |  |  |  |
| How does                                                                                                                                                                                                                                                                                                                                                                                                | How does sub-block placement improve memory system performance?                  |   |  |  |  |
| Spring 2015                                                                                                                                                                                                                                                                                                                                                                                             | CSE 471: Advanced Caching 7<br>Techniques 7                                      | 7 |  |  |  |



















|                                | Other Techniques                                                                                               |    |  |  |
|--------------------------------|----------------------------------------------------------------------------------------------------------------|----|--|--|
| Hardware or                    | compiler-based prefetching (decreases misses)                                                                  |    |  |  |
|                                | Coupling a write-through memory update policy with a write buffer (eliminates store ops/hides store latencies) |    |  |  |
| TLB (reduce                    | TLB (reduce page fault time (penalty))                                                                         |    |  |  |
| Cache hiera                    | rchies (reduce miss penalty)                                                                                   |    |  |  |
| Virtual cache                  | es (reduce L1 cache access time)                                                                               |    |  |  |
| Wider bus (increase bandwidth) |                                                                                                                |    |  |  |
|                                |                                                                                                                |    |  |  |
|                                |                                                                                                                |    |  |  |
|                                |                                                                                                                |    |  |  |
|                                |                                                                                                                |    |  |  |
|                                |                                                                                                                |    |  |  |
|                                |                                                                                                                |    |  |  |
| Spring 2015                    | CSE 471: Advanced Caching                                                                                      | 17 |  |  |

CSE 471: Advanced Caching Techniques