Abstract

The recent advancement in software industry such as Microsoft utilizing FPGAs (Field Programmable Gate Arrays) for acceleration in its search engine Bing and Intel's initiative to have its CPU along with Altera FPGA in the same chip indicates FPGA's potential as well as growing demand in the field of high performance computing. FPGAs provide accelerated computation due to their flexible architecture. However it creates challenges for the system designer as efficient design in terms of latency, power and energy demands hardware programming expertise. Hardware coding is a time consuming as well as an error prone task. High Level Synthesis (HLS) addresses these challenges by enabling programmer to code in High-level languages (HLL) such as C, C++, SystemC, CUDA and translating this code to hardware language such as Verilog or VHDL. Even though HLS tools provide several optimizations, their performance is limited due to the implementation constraints. Some of the software constructs widely used in high level language such as dynamic memory allocation, pointer-based data structures and recursion are very hard to implement well in hardware and thereby restricting the performance of HLS. Source-to-source translation is a mechanism to optimize the code in HLL so that the compiler can perform better in terms of code optimization. This article investigates whether the source-to-source translation widely used in HLL can also benefit high level synthesis. For this study, Bones source-to-source compiler is selected to perform the translation of C code to C (Optimized-C) and OpenMP code. These three types of code: C, Optimized-C and OpenMP were synthesized in LegUP HLS for three benchmarks; the performance statistics were measured for all the nine cases and analysis was conducted in terms of speedup, area reduction, power and energy consumption. OpenMP code performed better as compared to original C code in terms of execution time (speedup range 1.86–3.49), area (gain range 1–6.55) and energy (gain range 1.86–3.55). However optimized-C code did not always perform better than the original C-code in terms of execution time (speedup range 0.27–3.08), area (gain range 0.83–5.7) and energy (gain range 0.27–3.13). The power statistics observed were almost the same for all the three input versions of the code.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.