The challenges of using an embedded MPI for hardware-based processing nodes

Daniel L Ly,Manuel Saldana,Paul Chow

doi:10.1109/fpt.2009.5377688

Abstract

This paper presents several challenges and solutions in designing an efficient Message Passing Interface (MPI) implementation for embedded FPGA applications. Popular MPI implementations are designed for general-purpose computers which have significantly different properties and trade-offs than embedded platforms. Our work focuses on two types of interactions that are not present in typical MPI implementations. First, a number of improvements designed to accelerate software-hardware interactions are introduced, including a Direct Memory Access (DMA) engine with MPI functionality; the use of non-interrupting, non-blocking messages; and a proposed function, called MPI_Coalesce, to reduce the function call overhead from a series of sequential messages. These improvements resulted in a speed-up of 5-fold compared to an embedded software-only MPI implementation. Next, a novel dataflow message passing model is presented for hardware-hardware interactions to overcome the limitations of atomic messages, allowing hardware engines to communicate and compute simultaneously. This dataflow model provides a natural method for hardware designers to build high performance, MPI systems. Finally, two hardware cores, Tee cores and message watchdog timers, are introduced to provide a transparent method of debugging hardware MPI designs.

Full Text