The semantics of shared memory in Intel CPU/FPGA systems

Dan Iorga,Alastair F Donaldson,John Wickerson,Tyler Sorensen

doi:10.1145/3485497

Abstract

Heterogeneous CPU/FPGA devices, in which a CPU and an FPGA can execute together while sharing memory, are becoming popular in several computing sectors. In this paper, we study the shared-memory semantics of these devices, with a view to providing a firm foundation for reasoning about the programs that run on them. Our focus is on Intel platforms that combine an Intel FPGA with a multicore Xeon CPU. We describe the weak-memory behaviours that are allowed (and observable) on these devices when CPU threads and an FPGA thread access common memory locations in a fine-grained manner through multiple channels. Some of these behaviours are familiar from well-studied CPU and GPU concurrency; others are weaker still. We encode these behaviours in two formal memory models: one operational, one axiomatic. We develop executable implementations of both models, using the CBMC bounded model-checking tool for our operational model and the Alloy modelling language for our axiomatic model. Using these, we cross-check our models against each other via a translator that converts Alloy-generated executions into queries for the CBMC model. We also validate our models against actual hardware by translating 583 Alloy-generated executions into litmus tests that we run on CPU/FPGA devices; when doing this, we avoid the prohibitive cost of synthesising a hardware design per litmus test by creating our own 'litmus-test processor' in hardware. We expect that our models will be useful for low-level programmers, compiler writers, and designers of analysis tools. Indeed, as a demonstration of the utility of our work, we use our operational model to reason about a producer/consumer buffer implemented across the CPU and the FPGA. When the buffer uses insufficient synchronisation -- a situation that our model is able to detect -- we observe that its performance improves at the cost of occasional data corruption.

Highlights

The end of Dennard scaling in the early 2000s led to CPU designers resorting to duplicating processor cores to make computational gains, exploiting additional transistors that became available year on year thanks to Moore’s law [Rupp 2015]
A recent trend in heterogeneous systems is to combine a homogeneous multicore CPU with a field-programmable gate array (FPGA). These combined CPU/FPGA systems are of special interest because the FPGA component can be configured to represent one or more processing elements customised for a particular computationally-intensive sub-task, while the overall application can be written to run on the general-purpose CPU
We have mechanised the operational semantics in C, in a form suitable for analysis with the CBMC model checker [Clarke et al 2004]

Summary

INTRODUCTION

The end of Dennard scaling in the early 2000s led to CPU designers resorting to duplicating processor cores to make computational gains, exploiting additional transistors that became available year on year thanks to Moore’s law [Rupp 2015]. Our contribution is a detailed formal case study of the memory semantics of Intel’s latest CPU/FPGA systems These combine a multicore Xeon CPU with an Intel FPGA, and allow them to share main memory through Intel’s Core Cache Interface (CCI-P) [Intel 2019]. Using a back-end that converts an execution into a corresponding C program, we have used these executions and the CBMC model checker to validate our operational model both ‘from above’ and ‘from below’; that is, every disallowed execution generated from the axiomatic model is disallowed by the operational model, and removing any event from such an execution causes it to become allowed by the operational model This combination of a mechanised operational and axiomatic semantics allowed us to set up a virtuous cycle where we would cross-check the models using a batch of generated tests, find a discrepancy, confirm the correct behaviour by referring to the manual or discussing with an Intel engineer, refine our axioms or our operational model, and repeat. The design of a soft-core processor that allows memory model litmus tests to be executed on FPGA hardware in an efficient manner (Section 5);. More complicated CPU/FPGA interactions using standard litmus tests, instantiated for the X+F system, where one thread is on the CPU and the other is on the FPGA (Section 2.2)

FPGA Coherency

Implementing Litmus Tests on the FPGA

Actions

States

Justifications for Modelling Decisions

CBMC Implementation and Litmus Tests

Executions

Consistency Axioms

Generating Executions from the Axioms

Cross-checking the Axiomatic and Operational Models

EXPERIMENTAL EVALUATION

CASE STUDY

Implementation

Performance Comparison

Exploring Incorrect Behaviour

FURTHER RELATED WORK

Findings

CONCLUSION

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Proceedings of the ACM on Programming Languages	Publication Date: Oct 15, 2021
Citations: 6	License type: cc-by

R Discovery Prime

R Discovery Prime

The semantics of shared memory in Intel CPU/FPGA systems

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Proceedings of the ACM on Programming Languages

Lead the way for us

Similar Papers

Complete Formal Specification of the OpenMP Memory Model
Greg Bronevetsky ... Bronis R De Supinski
International Journal of Parallel Programming | VOL. 35
Greg Bronevetsky, et. al.Greg Bronevetsky ... Bronis R De Supinski
14 Jul 2007
International Journal of Parallel Programming | VOL. 35

A Resolution for Scalability Problem of Record Datatype Based Formal Memory Models in Coq
Zheng Yang ... Hang Lei
-
Zheng Yang, et. al.Zheng Yang ... Hang Lei
20 Oct 2020
20 Oct 2020

Formal Specification of the OpenMP Memory Model
Greg Bronevetsky ... Bronis R De Supinski
-
Greg Bronevetsky, et. al.Greg Bronevetsky ... Bronis R De Supinski
01 Jan 2008
01 Jan 2008

An Isabelle/HOL Formalisation of the SPARC Instruction Set Architecture and the TSO Memory Model
Zhé Hóu ... David Sanan
Journal of Automated Reasoning | VOL. 65
Zhé Hóu, et. al.Zhé Hóu ... David Sanan
14 Aug 2020
Journal of Automated Reasoning | VOL. 65

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

The semantics of shared memory in Intel CPU/FPGA systems

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Proceedings of the ACM on Programming Languages