# Memory Consistency A Crash Course

Brandon Lucia CSE 471

## Memory Consistency Model

Informal Definition:

"Defines the value a read operation may read at each point during the execution"

## Memory Consistency Model

Informal Definition:

"Defines the value a read operation may read at each point during the execution"

"Defines the set of legal observable orders of memory operations during an execution"

## Memory Consistency Model

Informal Definition:

"Defines the value a read operation may read at each point during the execution"

"Defines the set of legal observable orders of memory operations during an execution"

"Defines which reorderings of memory operations are permitted"

### Review: Coherence



#### 2 Invariants:

I) "One Writer orOne or More Readers"

2) "Reading X gets the value Rd X of the last write to X"

### Review: Coherence



#### 2 Invariants:

I) "One Writer or One or More Readers"

2) "Reading X gets the value of the last write to X"

## Without Coherence

(The coherence invariants prevent this from happening)



Processors can't decide who wrote last. Green is hosed.

## Coherence is Ordering



Coherence defines the set of legal orders of accesses to a single memory location

# Consistency is Ordering



Consistency defines the set of legal orders of accesses to multiple memory locations

## Sequential Consistency (SC)

The simplest, most intuitive memory consistency model

#### Two Invariants to SC:

Instructions are executed in program order

All processors agree on a total order of executed instructions

















Execution
Wr X
RdY
WrY

Rd X





Execution
Wr X
Rd Y
Wr Y
Rd X

Rd X

Execute



# Who cares?... You care!

### SC is how programmers think.



```
Intuitive (SC) Weird (not SC)
Wr X
RdY
Wr X
Rd X
Rd X
Rd X
Rd X
Rd X
Wr Y
```

SC prohibits **all** reordering of instructions (Invariant 1)

# Why are Instructions Reordered? And when does it matter anyway?

## Why are Instructions Reordered?

Optimization.



Buffered writes eventually end up in coherent shared memory



$$\underline{Program}$$
Initially  $X == Y == 0$ 

$$X=I \qquad Y=I$$

$$rI=Y$$
  $r2=X$ 

ls rl==r2==0
a valid result?



rI == r2 == 0 is **not** SC, but it can happen with write buffers



**Execution** 



 $\underline{\text{Program}}$ Initially X == Y == 0

$$rI=Y$$
  $r2=X$ 

Execution



 $\underline{\text{Program}}$ Initially X == Y == 0

$$rI=Y$$
  $r2=X$ 

Execution



 $\underline{\text{Program}}$ Initially X == Y == 0

r2=X

Execution



 $\underline{\text{Program}}$ Initially X == Y == 0

**Execution** 



$$\underline{\text{Program}}$$
Initially  $X == Y == 0$ 



$$\underline{\text{Program}}$$
Initially  $X == Y == 0$ 

```
Execution
rI=Y [rI <- 0]
r2=X [r2 <- 0]
```



WBs let reads finish before older writes

$$\underline{\text{Program}}$$
Initially  $X == Y == 0$ 

```
Execution

rI=Y [rI <- 0]

r2=X [r2 <- 0]

X=I

Y=I (Not SC!)
```



4 word cache line

Program

X,Z in same \$ line

X=I Y=I Z=I

Coalescing Write Buffer

| <br>Coarciscing virite Dunci |  |  |  |  |  |  |
|------------------------------|--|--|--|--|--|--|
| X=I                          |  |  |  |  |  |  |
|                              |  |  |  |  |  |  |
|                              |  |  |  |  |  |  |
|                              |  |  |  |  |  |  |
|                              |  |  |  |  |  |  |
|                              |  |  |  |  |  |  |

Program

X,Z in same \$ line

$$Z=I$$

Coalescing Write Buffer

| X=I |     |  |
|-----|-----|--|
|     | Y=I |  |
|     |     |  |
|     |     |  |

Program

X,Z in same \$ line

$$Z=I$$

Coalescing Write Buffer

| _ | Coalcochie virice Duller |     |  |     |  |  |  |
|---|--------------------------|-----|--|-----|--|--|--|
|   | X=I                      |     |  |     |  |  |  |
|   |                          |     |  | Y=I |  |  |  |
|   |                          | Z=I |  |     |  |  |  |
|   |                          |     |  |     |  |  |  |

Program

X,Z in same \$ line

$$Z=I$$



Combining the write to X & Z saves bandwidth, but **reorders** Z=I and Y=I

## Reordering #3: Compilers

$$X = 0$$
for (i .. 100)
$$X = I$$

$$\text{print } X$$

$$X = 0$$

$$\text{print } X$$

$$X = 0$$

$$\text{print } X$$

The compiler hoists the write out of the loop, permitting new (non-SC) results (e.g., "I 0 0 0 0 0 ...")

### When is Reordering a Problem?

Thursday, April 26, 2012

### When is Reordering a Problem?

When Executions Aren't SC

When a memory operation happens before itself

# Execution rI=Y [rI <- 0] r2=X [r2 <- 0] X=I Y=I

#### Happens-Before Graph

$$X=I$$
  $Y=I$ 

$$rI=Y$$
  $r2=X$ 

When a memory operation happens before itself

#### Happens-Before Graph

Program Order HB Edge

When a memory operation happens before itself



Happens-Before Graph



Program Order HB Edge
Causal Order HB Edge

Thursday, April 26, 2012

When a memory operation happens before itself



Happens-Before Graph



If there is a cycle in the happens-before graph, the execution is not SC

## So... are Computers Wrong?!

SC is how programmers think.

SC prohibits all reordering of instructions

WBs let reads finish before older writes

Combining writes saves bandwidth but reorders writes

### Relaxed Memory Consistency

Relaxed Memory Models permit reorderings, unlike SC

#### **x86-TSO** (intel x86s)

"The Write Buffer Memory Model"



Relaxes W->R order

Total Store Order - loads may complete before older stores to different locations complete.

#### PSO(SPARC)

"The Write Combining Memory Model"



Relaxes W->W order

Partial Store Order - loads and stores may complete before older stores to different locations complete.

#### In General



Starting with PSO and relaxing R->R and R->W yields Weak Ordering or Release Consistency (alpha)

Depending on the implementation

### SC and Relaxed Consistency

SC is required for correctness and programmer sanity

+

Reordering is required\* for performance

Goal: Ensure SC executions while permitting Relaxed Consistency reorderings

\*Usually; the MIPS memory model is **SC** (surprising!)

Thursday, April 26, 2012

## How to ensure SC, but permit reordering?

## Synchronization Prevents Reordering

Memory fences are another type of synchronization



Fence implementation depends on reordering implementation

TSO: Stall reads until write buffer is empty

## Synchronization For Real Programmers

Memory fences are wrapped up in locks, etc.



Direct use of fences possible, but inadvisable. USE A SYNCHRONIZATION LIBRARY

#### Data Races

Synchronization imposes happens-before on otherwise unordered operations



Data Race: Unordered operations to the same memory location, at least one a write

## Memory Models across the System Stack

#### Language

Java/C++: SC for data-race-free programs

#### Compiler

Conservative with reordering when d-r-f can't be proved

#### Architecture

Usually very weak for max optimization (lots of reordering)

Note: fences from "above" ensure SC