



## **Cache Coherency** Cache coherency protocols • (usually) hardware mechanism for maintaining cache coherency coherency state associated with a cache block of data · operations on shared data change the state • for the processor that initiates an operation • for other processors that have the data of that operation resident in their caches two general types snooping with a bus · directory with a multi-path interconnect · In sum, hardware implementation for: sharing state of each cache block · rules for changing this state in response to memory operations • implemented as a state transition diagram 3 Spring 2013 CSE 471 - Cache Coherence





























|             | <b>Directory Implementation</b>                                                            |        |
|-------------|--------------------------------------------------------------------------------------------|--------|
|             | state is associated with units of memory that are the size locks: directory state          | e of   |
|             | h directory tracks the coherence state of the units in its m<br>odates it                  | nemory |
| •           | uncached (invalid in snooping):                                                            |        |
|             | • no processor has the data cached & memory is up-t                                        | o-date |
| •           | shared:                                                                                    |        |
|             | <ul> <li>at least 1 processor has the data cached &amp; memory<br/>to-date</li> </ul>      | is up- |
|             | <ul> <li>block can be read by any processor</li> </ul>                                     |        |
| •           | exclusive (also called modified):                                                          |        |
|             | <ul> <li>only 1 processor (the owner) has the data cached &amp; memory is stale</li> </ul> |        |
|             | <ul> <li>only that processor can write to it</li> </ul>                                    |        |
| • dire      | ctory tracks which processors share its memory blocks                                      |        |
|             | vector of presence bits (1/processor) to indicate which processor(s) has cached the data   |        |
| •           | dirty bit to indicate if exclusive                                                         |        |
| Spring 2013 | CSE 471 - Cache Coherence                                                                  | 18     |









|                            | Directory P                                         | rotocol Message               | <u>s</u>                   |
|----------------------------|-----------------------------------------------------|-------------------------------|----------------------------|
| Message type               | Source                                              | Destination                   | Message Content            |
| Read miss                  | Local cache                                         | Home directory                | P, A                       |
|                            | P reads data at address .<br>ead sharer and arrange | ,                             |                            |
| Write miss                 | Local cache                                         | Home directory                | P, A                       |
|                            | P writes data at address<br>exclusive owner and ar  | A;<br>range to send data back |                            |
| Invalidate                 | Home directory                                      | Remote caches                 | А                          |
| – Invalidate c             | shared copy at address                              | s A.                          |                            |
| Fetch                      | Home directory                                      | Remote cache                  | А                          |
| – Fetch the b              | lock at address A and se                            | end it to its home directory  | ,                          |
| Fetch/Invalidate           | Home directory                                      | Remote cache                  | А                          |
| – Fetch the b<br>the cache | lock at address A and se                            | end it to its home directory  | y; invalidate the block in |
| Data value reply           | Home directory                                      | Local cache                   | Data                       |
| – Return a da              | ta value from the home                              | memory (read or write mi      | iss response)              |
| Data write-back            | Remote cache                                        | Home directory                | A, Data                    |
| – Write-back               | a data value for addres                             | s A (invalidate response)     |                            |
| Spring 2013                | CSE 47                                              | 1 - Cache Coherence           | 23                         |



| Dire        | ctory FSM for a Memory Block | 2   |
|-------------|------------------------------|-----|
|             |                              | ol) |
|             |                              |     |
|             |                              |     |
| Spring 2013 | CSE 471 - Cache Coherence    | 25  |













|           | Important Issues                                                                                                                                                                                                                                                                                   |    |
|-----------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|----|
| •         | coherency:<br>its definition<br>the hardware support<br>write-invalidate protocols<br>• how bus-based protocols work<br>• how directories work<br>how coherency protocols match or take advantage of the MP<br>design                                                                              |    |
| •         | to our knowledge:<br>a 4 <sup>th</sup> type of miss (coherency misses)<br>a 3 <sup>rd</sup> locality (processor)<br>a 2 <sup>nd</sup> application of snooping (bus-based coherency protocol)<br>a 2 <sup>nd</sup> use of sub-block placement<br>a 3 <sup>rd</sup> latency vs. throughput trade-off |    |
| Spring 20 | 13 CSE 471 - Cache Coherence                                                                                                                                                                                                                                                                       | 32 |



|                                 | Apply What You Know                                                                              |  |
|---------------------------------|--------------------------------------------------------------------------------------------------|--|
|                                 | e:<br>rs state transitions<br>e state changes, given a sequence of memory                        |  |
| <ul> <li>what trigge</li> </ul> | t based on invalidations:<br>rs state transitions<br>e state changes, given a sequence of memory |  |
|                                 |                                                                                                  |  |

## **Apply What You Know**

## Example:

Assume you have a 4-state, write-invalidate protocol, in which three of the states are those used in the baseline 3-state protocol we studied in class and the fourth state is a new one, called *private clean*. A private clean state means that there is only one cached copy of the data, and that it is a read-only copy (i.e., it has the same value as its backup in memory). Using this new 4-state coherency protocol, fill in the state values for a single cache block in each of the processors (P0, P1, P2), for each of the memory operations listed in the first column. Assume the multiprocessor is bus-based.

| Operations   | P0      | P1      | P2      |
|--------------|---------|---------|---------|
| Initially    | invalid | invalid | invalid |
| P1: loads B  |         |         |         |
| P2: loads B  |         |         |         |
| P0: stores B |         |         |         |
| P1: loads B  |         |         |         |
| P1: stores B |         |         |         |
|              |         |         |         |

Spring 2013

CSE 471 - Cache Coherence

35