Title: Understanding GPGPU Vector Register File Usage

Advisors: Mark Oskin and Luis Ceze

Abstract: Graphics processing units (GPUs) have emerged as a favored compute accelerator for workstations, servers, and supercomputers. At their core, GPUs are massively-multithreaded compute engines, capable of concurrently supporting over one hundred thousand active threads. Supporting this many threads requires storing context for every thread on-chip, and results in large vector register files consuming a significant amount of die area and power. Thus, it is imperative that the vast number of registers are used effectively, efficiently, and to maximal benefit. This work evaluates the usage of the vector register file in a modern GPGPU architecture. We confirm the results of prior studies, showing vector registers are reused in small windows by few consumers and that vector registers are a key limiter of workgroup dispatch. We then evaluate the effectiveness of previously proposed techniques at reusing register values and hiding bank access conflict penalties. Lastly, we study the performance impact of introducing additional vector registers and show that additional parallelism is not always beneficial, somewhat counter-intuitive to the “more threads, better throughput” view of GPGPU acceleration.

Place: 
CSE 615
When: 
Wednesday, January 24, 2018 - 13:00 to 14:30