Abstract

Transformer-type Neural Networks (NNs) have shown impressive accuracy numbers in Natural Language Processing (NLP) applications where Recurrent Neural Networks (RNNs) have been in use before, even surpassing them. However, differing considerably from common types of NNs, existing accelerator designs, particularly for Field-Programmable Gate Arrays (FPGAs), cannot be used to implement them. Previous research has shown FPGAs to be platforms superior to CPUs and even GPUs for accelerating NNs when it comes to energy efficiency. Following the development of automated compiler-based design flows for NNs, there is still a lack of such an approach for transformers and FPGA targets. In this realm, this paper presents a novel compiler called TRAC as well as a library of operators and modules for implementing transformer accelerators on FPGAs. Based on optimization and code generation settings in the compiler using an integrated approach combining weight compression techniques with according adaptations of the accelerator modules, a design space of accelerators is defined and explored. For each design, a system-level data path and control unit architecture is generated, which integrates module-level designs using hierarchical High-Level Synthesis (HLS). We evaluate our implementation for the BERT network and provide results regarding the trade-off between execution time, accuracy, and FPGA resource usage.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call