Abstract

General purpose processors, graphics processing units (GPUs) and field-programmable gate-arrays (FPGAs) compete and collaborate to offer ever increasing performances. Nevertheless, despite fruitful decades of research, FPGA are still a lot more difficult to exploit than processor-based approaches. It is today possible to automatically map C/C++/SystemC algorithms into circuits. However, exploiting fine grain parallelism for control dominant applications is still reserved to highly specialized people in hardware design. This paper presents the application of our synchronized-transfer-level hardware design methodology to the implementation of pipelined floating point operators. The methodology builds on a hardware description language for which the designer manages dynamic connections between data token sources and sinks. A compiler automates the generation and the optimization of the synchronization logic, whose low-level complexity is thus hidden to the designer. Applied to the design of a floating-point matrix multiplication hardware accelerator, the proposed methodology leads to similar computing performances than the dedicated designs reported in the literature but within shorter design times (hours instead of days), simpler source code and no need for advanced hardware design skills.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call