Abstract

Over the past few years there has been increased interest in building custom computing machines (CCMs) as a way of achieving very high performance on specific problems. The advent of high density field programmable gate arrays (FPGAs), in combination with new synthesis tools, have made it relatively easy to produce programmable custom machines without building specific hardware. In many cases, the performance achieved by a FPGA based custom computer is attributed to the exploitation of massive concurrency in the underlying application. In this paper we explore the sources of speedup for irregular problems in which is difficult to exploit such parallelism. We highlight 5 main sources of speedup that we have observed, namely the provision of high memory bandwidth, the use of flexible address generation hardware, the use of gather-scatter array operations, the use of lookup tables and the use of multiple tailored arithmetic units. By considering some representative examples of such irregular problems, the paper illustrates that good performance is possible given the current generation of FPGA devices and RISC processors. The paper then explores whether this performance gain will be possible given the next generation of RISC processors and FPGAs. It concludes that the only way to maintain the speedup is to alter the architecture of CCMs in combination with architectural changes to the FPGAs themselves.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.