By Tor Jeremiassen and Susan J. Eggers
The algorithms eliminated an average (across the entire workload) of 64% of false sharing misses, and in two programs more than 90%. However, how well the reduction in false sharing misses translated into improved execution time depended heavily on the memory subsystem architecture and previous programmer efforts to optimize for locality. On a multiprocessor with a large cache configuration and high cache miss penalty, the transformations improved the execution time of programmer-unoptimized applications by as much as 60\%. However, on programs where previous programmer efforts to improve data locality had reduced the original amount of false sharing, and on a multiprocessor with a small cache configuration and cache miss penalty, the gains were more modest.
@techreport{JeEg94:FalseSharing,
author="T.E. Jeremiassen and S.J. Eggers",
title={Reducing False Sharing on Shared Memory Multiprocessors through
Compile Time Data Transformations},
institution="University of Washington",
number="94-09-05",
year="1994"
}