High-performance, deep neural networks with sub-microsecond latency on FPGAs for trigger applications

Noel Nottbeck,Volker Büscher,Christian Schmitt

doi:10.1088/1742-6596/1525/1/012046

Noel Nottbeck, Volker Büscher + Show 1 more

Open Access

https://doi.org/10.1088/1742-6596/1525/1/012046

Copy DOI

Abstract

Modern high-end FPGAs, as they are often used for hardware-level trigger applications, offer enough arithmetic performance to include artificial neural networks of considerable size into such systems. Yet, there are only very few examples of the inclusion of ANNs into high-performance hardware triggers, which is especially due to the complex and time-consuming development for FPGAs, and the need for an optimized design in order to make efficient use of the FPGA capabilities. We developed a library that provides three types of layers: Fully-connected dense layers, as well as 2D multi-channeled convolution and maximum pooling layers. For maximum design control, these were designed with VHDL and optimized for the specific data flow and control requirements of each layer type. By that, it was possible to obtain multiple hundred MHz processing frequency and have only little resource overhead beyond what is required for the actual computation for the individual layers. Furthermore, we created a Python-based toolkit that builds on these layer implementations to make it possible to take a trained network from the Keras framework and create the FPGA firmware and initialization data without requirement of in-depth understanding by the user. The resulting (deep) network designs can process data coming in at multiple ten MHz at multiple hundred MHz processing frequency and latencies ranging from tens to few hundreds of nanoseconds, depending on the network size.

Full Text