Abstract

Heterogeneous CPU/FPGA devices, in which a CPU and an FPGA can execute together while sharing memory, are becoming popular in several computing sectors. In this paper, we study the shared-memory semantics of these devices, with a view to providing a firm foundation for reasoning about the programs that run on them. Our focus is on Intel platforms that combine an Intel FPGA with a multicore Xeon CPU. We describe the weak-memory behaviours that are allowed (and observable) on these devices when CPU threads and an FPGA thread access common memory locations in a fine-grained manner through multiple channels. Some of these behaviours are familiar from well-studied CPU and GPU concurrency; others are weaker still. We encode these behaviours in two formal memory models: one operational, one axiomatic. We develop executable implementations of both models, using the CBMC bounded model-checking tool for our operational model and the Alloy modelling language for our axiomatic model. Using these, we cross-check our models against each other via a translator that converts Alloy-generated executions into queries for the CBMC model. We also validate our models against actual hardware by translating 583 Alloy-generated executions into litmus tests that we run on CPU/FPGA devices; when doing this, we avoid the prohibitive cost of synthesising a hardware design per litmus test by creating our own 'litmus-test processor' in hardware. We expect that our models will be useful for low-level programmers, compiler writers, and designers of analysis tools. Indeed, as a demonstration of the utility of our work, we use our operational model to reason about a producer/consumer buffer implemented across the CPU and the FPGA. When the buffer uses insufficient synchronisation -- a situation that our model is able to detect -- we observe that its performance improves at the cost of occasional data corruption.

Highlights

  • The end of Dennard scaling in the early 2000s led to CPU designers resorting to duplicating processor cores to make computational gains, exploiting additional transistors that became available year on year thanks to Moore’s law [Rupp 2015]

  • A recent trend in heterogeneous systems is to combine a homogeneous multicore CPU with a field-programmable gate array (FPGA). These combined CPU/FPGA systems are of special interest because the FPGA component can be configured to represent one or more processing elements customised for a particular computationally-intensive sub-task, while the overall application can be written to run on the general-purpose CPU

  • We have mechanised the operational semantics in C, in a form suitable for analysis with the CBMC model checker [Clarke et al 2004]

Read more

Summary

INTRODUCTION

The end of Dennard scaling in the early 2000s led to CPU designers resorting to duplicating processor cores to make computational gains, exploiting additional transistors that became available year on year thanks to Moore’s law [Rupp 2015]. Our contribution is a detailed formal case study of the memory semantics of Intel’s latest CPU/FPGA systems These combine a multicore Xeon CPU with an Intel FPGA, and allow them to share main memory through Intel’s Core Cache Interface (CCI-P) [Intel 2019]. Using a back-end that converts an execution into a corresponding C program, we have used these executions and the CBMC model checker to validate our operational model both ‘from above’ and ‘from below’; that is, every disallowed execution generated from the axiomatic model is disallowed by the operational model, and removing any event from such an execution causes it to become allowed by the operational model This combination of a mechanised operational and axiomatic semantics allowed us to set up a virtuous cycle where we would cross-check the models using a batch of generated tests, find a discrepancy, confirm the correct behaviour by referring to the manual or discussing with an Intel engineer, refine our axioms or our operational model, and repeat. The design of a soft-core processor that allows memory model litmus tests to be executed on FPGA hardware in an efficient manner (Section 5);. More complicated CPU/FPGA interactions using standard litmus tests, instantiated for the X+F system, where one thread is on the CPU and the other is on the FPGA (Section 2.2)

FPGA Coherency
Implementing Litmus Tests on the FPGA
Actions
States
Justifications for Modelling Decisions
CBMC Implementation and Litmus Tests
Executions
Consistency Axioms
Generating Executions from the Axioms
Cross-checking the Axiomatic and Operational Models
EXPERIMENTAL EVALUATION
CASE STUDY
Implementation
Performance Comparison
Exploring Incorrect Behaviour
FURTHER RELATED WORK
Findings
CONCLUSION

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.