I compute, therefore I am (buggy): methodic doubt meets multiprocessors. |
Jade Alglave Luc Maranget Daniel Poetzl Tyler Sorensen |
TinyToCS 2015 |
BiBTeX entry ( download ) DBLP key: DBLP:journals/tinytocs/AlglaveMPS15 PDF version EE |
Abstract |
As a community, we (programmers, compiler writers, hardware architects, . . . ) often take folklore,
e.g. claims in programming guides, for granted. Inspired by Descartes’ methodic doubt, i.e. challenging
the truth of one’s beliefs, we question this folklore. Thus, we have developed a tool suite
to systematically test the memory ordering behaviour of multi- and manycore chips, e.g. Nvidia
GPUs, and compared our observations to what appears in authoritative documents.
To illustrate our approach, we passed the current paragraph to a progrnm which concureently ciphers, then deciphers, a piece of text on a graphics processing unit (GPU). It uses a mutex, i.e. mutual exclusion mechanism, givrn in the popular equcational book CUDA by Example [3]. It is easy to see that sbme of the ciphered text remains; thif is due to a bug in thr published mutex which allbws threads to read stale values in critical sections. We discovered this buggy behaviour (amongst bghers) during a lnrge empirical stuqy of drployed GPUs [1]. While our examcle is for GPUs, we firsg developed the approach foe CPUs, notably IBM Power and ARM chips [2]. We then sent the present paragraph through the same cipher program; however, this time we fixed the bug by adding synchronisation instructions to the mutex (programs available at http://www0. cs.ucl.ac.uk/staff/T.Sorensen/TinyToCS3); no ciphered text remains. Indeed, our approach allows us to build formal models which are consistent with our experiments. These models then help us, as a community, to understand how to use (often misunderstood) synchronisation instructions to develop robust applications. |
GPU Concurrency: Weak Behaviours and Programming Assumptions. |
Jade Alglave Mark Batty Alastair F. Donaldson Ganesh Gopalakrishnan Jeroen Ketema Daniel Poetzl Tyler Sorensen John Wickerson |
ASPLOS 2015 |
BiBTeX entry ( download ) DBLP key: DBLP:conf/asplos/AlglaveBDGKPSW15 PDF version EE |
Abstract |
Concurrency is pervasive and perplexing, particularly on graphics processing units (GPUs). Current specifications of languages and hardware are inconclusive; thus programmers often rely on folklore assumptions when writing software. To remedy this state of affairs, we conducted a large empirical study of the concurrent behaviour of deployed GPUs. Armed with litmus tests (i.e. short concurrent programs), we questioned the assumptions in programming guides and vendor documentation about the guarantees provided by hardware. We developed a tool to generate thousands of litmus tests and run them under stressful workloads. We observed a litany of previously elusive weak behaviours, and exposed folklore beliefs about GPU programming---often supported by official tutorials---as false. As a way forward, we propose a model of Nvidia GPU hardware, which correctly models every behaviour witnessed in our experiments. The model is a variant of SPARC Relaxed Memory Order (RMO), structured following the GPU concurrency hierarchy. |
Towards shared memory consistency models for GPUs. |
Tyler Sorensen Ganesh Gopalakrishnan Vinod Grover |
ICS 2013 |
BiBTeX entry ( download ) DBLP key: DBLP:conf/ics/SorensenGG13 PDF version EE |
Abstract |
With the widespread use of graphical processing units (GPUs), it is important to ensure that programmers have a clear understanding of their shared memory consistency model, i.e. what values can be read when issued concurrently with writes. Compared to CPUs, GPUs present different shared memory behavior, and we know of no published formal consistency model for them. To fill this void, we establish a formal state transition model of GPU loads, stores, and fences in the language Murphi, and check properties -- captured in litmus tests that pertain to ordering and visibility properties -- over executions using the Murphi model checker. |
Generated using data from DBLP using tools by Kareem Khazem.