An Improved Framework for and Case Studies in FPGA-Based Application Acceleration - Computer Vision, In-Network Processing and Spiking Neural Networks

Jaco Hofmann

doi:10.25534/tuprints-00010355

Abstract

Field Programmable Gate Arrays (FPGAs) are a new addition to the world of data center acceleration. While the underlying technology has been around for decades, their application in data centers slowly starts gaining traction. However, there are myriad problems that hinder the widespread application of FPGAs in the data center. The closed source tool chains result in vendor lock-in and unstable tool flows. The languages used to program FPGAs require different design processes which are not easily learned by software developers. Compared to commodity solutions using CPUs and GPUs, FPGAs are more expensive and more time consuming to develop for. All of this and more make FPGAs a tough sell to people in need of task acceleration. Nonetheless, FPGAs also offer an opportunity to develop faster accelerators with a smaller energy envelop for rapidly changing applications. This work presents a solution to FPGA abstraction using the TaPaSCo framework. TaPaSCo simplifies moving between different FPGA architectures and automates scaling of accelerators across a multitude of devices. In addition, the framework provides a homogenized way of interacting with the accelerators. This thesis presents applications where FPGAs offer many benefits in the data center. Applications such as Semi-Global Block Matching which are difficult to compute on CPUs and GPUs due to the specific data transfer patterns, can be implemented highly efficiently an FPGAs. The presented work achieves over 35x of speedup on FPGAs compared to implementations of GPUs. FPGAs can also be used to improve network efficiency in the data center by replacing central network components with smart switches. The work presented here achieves up to 7x speedup over a classical distributed software implementation in a hash join scenario. Furthermore, FPGA can be used to bring new storage technologies into the data center by providing highly efficient consensus services right inside the network. The presented work shows that fetching pages remotely using a FPGA accelerated consensus system can be done as fast as 10us over the network which is only 55% of a conventional solution. These results make non-volatile network storage solutions as replacement for main memory viable. Lastly, this thesis presents a way of simulating parts of a brain with a very high level accuracy using FPGA. The spiking neural networks employed in the accelerator can benefit the research of brain functionality. The accelerator is capable of handling tens of thousands of neurons with a strict real time requirement of 50us per simulation step.

Full Text