













|              | Design Tradeoffs                                    |   |
|--------------|-----------------------------------------------------|---|
| Block size   |                                                     |   |
| the bigger t | he block,                                           |   |
| + the b      | etter the spatial locality                          |   |
| + less       | block transfer overhead/block                       |   |
| + less       | ag overhead/entry (assuming same number of entries) | ) |
| - migh       | t not access all the bytes in the block             |   |
|              |                                                     |   |
| Autumn 2006  | CSE P548 - Memory Hierarchy                         | 8 |

|                | Design Tradeoffs                                                                                                                                                                                                                                                                                                                                                                                                 |   |
|----------------|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|---|
| Associa<br>bec | <ul> <li>arger the associativity,</li> <li>the higher the hit ratio</li> <li>the larger the hardware cost (comparator/set)</li> <li>the longer the hit time (a larger MUX)</li> <li>need hardware that decides which block to replace</li> <li>increase in tag bits (if same size cache)</li> </ul> tivity is more important for small caches than large ause more memory locations map to the same line , TLBs! |   |
| Autumn 20      | 06 CSE P548 - Memory Hierarchy                                                                                                                                                                                                                                                                                                                                                                                   | 9 |

|                                  | Design Tradeoffs                                |     |
|----------------------------------|-------------------------------------------------|-----|
| Memory update                    | policy                                          |     |
| <ul> <li>write-thro</li> </ul>   | ugh                                             |     |
| <ul> <li>performand</li> </ul>   | nance depends on the # of writes                |     |
| <ul> <li>store b</li> </ul>      | ouffer decreases this                           |     |
| • st                             | ore compression                                 |     |
| • cł                             | leck on load misses                             |     |
| • write-bac                      | < c                                             |     |
| <ul> <li>performander</li> </ul> | nance depends on the # of dirty block replaceme | nts |
| but                              |                                                 |     |
| <ul> <li>dirty b</li> </ul>      | t & logic for checking it                       |     |
| <ul> <li>tag ch</li> </ul>       | eck before the write                            |     |
| <ul> <li>must f</li> </ul>       | ush the cache before I/O                        |     |
| <ul> <li>optimiz</li> </ul>      | zation: fetch before replace                    |     |
| <ul> <li>both use a</li> </ul>   | merging store buffer                            |     |
| Autumn 2006                      | CSE P548 - Memory Hierarchy                     | 10  |











|              | Design Tradeoffs                                                                                                                                           |    |  |  |  |
|--------------|------------------------------------------------------------------------------------------------------------------------------------------------------------|----|--|--|--|
| Virtually-ac | ldressed caches:                                                                                                                                           |    |  |  |  |
| - nee        | d to flush the cache on a context switch                                                                                                                   |    |  |  |  |
| • 1          | process identification (PID) can avoid this                                                                                                                |    |  |  |  |
| - syna       | onyms                                                                                                                                                      |    |  |  |  |
| •            | "the synonym problem"                                                                                                                                      |    |  |  |  |
|              | <ul> <li>if 2 processes are sharing data, two (different) virtual<br/>addresses map to the same physical address</li> </ul>                                |    |  |  |  |
|              | <ul> <li>2 copies of the same data in the cache</li> </ul>                                                                                                 |    |  |  |  |
|              | • on a write, only one will be updated; so the other has old dat                                                                                           | а  |  |  |  |
| • ;          | a solution: page coloring                                                                                                                                  |    |  |  |  |
|              | <ul> <li>processes share segments; all shared data have the same<br/>offset from the beginning of a segment, i.e., the same low-<br/>order bits</li> </ul> |    |  |  |  |
|              | <ul> <li>cache must be &lt;= the segment size<br/>(more precisely, each set of the cache must be &lt;= the<br/>segment size)</li> </ul>                    |    |  |  |  |
|              | • index taken from segment offset, tag compare on segment #                                                                                                | ł  |  |  |  |
| Autumn 2006  | CSE P548 - Memory Hierarchy                                                                                                                                | 16 |  |  |  |





| Cache Hierarchies |                                                                                                                                                                                                                    |    |  |
|-------------------|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|----|--|
| + dec<br>•        | rarchy<br>erent caches with different sizes & access times & purposes<br>rease effective memory access time:<br>many misses in the L1 cache will be satisfied by the L2 cache<br>avoid going all the way to memory |    |  |
| Autumn 2006       | CSE P548 - Memory Hierarchy                                                                                                                                                                                        | 19 |  |

|                                     | Cache Hierarchies                           |    |
|-------------------------------------|---------------------------------------------|----|
| Level 1 cache goa<br>so minimize hi | II: fast access<br>t time (the common case) |    |
|                                     |                                             |    |
|                                     |                                             |    |
|                                     |                                             |    |
|                                     |                                             |    |
| Autumn 2006                         | CSE P548 - Memory Hierarchy                 | 20 |









| ļ           | Measuring Cache Hierarchy Performance         |                                                                    |    |  |
|-------------|-----------------------------------------------|--------------------------------------------------------------------|----|--|
| Global Mis  | ss Ratio:                                     | <b>globalMR</b> = #misses in cache<br>#references generated by CPU |    |  |
| Example:    | 1000 Referenc<br>40 L1 misses<br>10 L2 misses | ces                                                                |    |  |
| global MR   | (L1):                                         |                                                                    |    |  |
| global MR   | (L2):                                         |                                                                    |    |  |
|             |                                               |                                                                    |    |  |
| Autumn 2006 |                                               | CSE P548 - Memory Hierarchy                                        | 25 |  |

