Automating customized computing

Jason Cong

doi:10.1109/fpt.2014.7082743

Abstract

Customized computing has been of interest to the research community for over three decades. The interest has intensified in the recent years as the power and energy become a significant limiting factor to the computing industry. For example, the energy consumed by the datacenters of some large internet service provides is well over 109 Kilowatt-hours. FPGA-based acceleration has shown 10–1000X performance/energy efficiency over the general-purpose processors in many applications. However, programming FPGAs as a computing device is still a significant challenge. Most of accelerators are designed using manual RTL coding. The recent progress in high-level synthesis (HLS) has improved the programming productivity considerably where one can quickly implement functional blocks written using high-level programming languages as C or C++ instead of RTL. But in using the HLS tool for accelerated computing, the programmer still faces a lot of design decisions, such as implementation choices of each module and communication schemes between different modules, and has to implement additional logic for data management, such as memory partitioning, data prefetching and reuse. Extensive source code rewriting is often required to achieve high-performance acceleration using the existing HLS tools. In this talk, I shall present the ongoing work at UCLA to enable further automation for customized computing. One effort is on automated compilation to combining source-code level transformation for HLS with efficient parameterized architecture template generations. I shall highlight our progress on loop restructuring and code generation, memory partitioning, data prefetching and reuse, combined module selection, duplication, and scheduling with communication optimization. These techniques allows the programmer to easily compile computation kernels to FPGAs for acceleration. Another direction is to develop efficient runtime support for scheduling and transparent resource management for integration of FPGAs for datacenter-scale acceleration, which is becoming a reality (for example, Microsoft recently used over 1,600 servers with FPGAs for accelerating their search engine and reported very encouraging results). Our runtime system provides scheduling and resource management support at multiple levels, including server node-level, job-level, and datacenter-level so that programmer can make use the existing programming interfaces, such as MapReduce or Hadoop, for large-scale distributed computation.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Automating customized computing

Abstract

Talk to us

Similar Papers

Lead the way for us

Similar Papers

A study of high-level synthesis: Promises and challenges
Kyle Rupnow ... Yun Liang
-
Kyle Rupnow, et. al.Kyle Rupnow ... Yun Liang
01 Oct 2011
01 Oct 2011

High level synthesis of stereo matching: Productivity, performance, and software constraints
Kyle Rupnow ... Yinan Li
-
Kyle Rupnow, et. al.Kyle Rupnow ... Yinan Li
01 Dec 2011
01 Dec 2011

On the efficiency of automatically generated accelerators for reconfigurable active SSDs
Mageda Sharafeddin ... Salim Mansour
-
Mageda Sharafeddin, et. al.Mageda Sharafeddin ... Salim Mansour
01 Dec 2014
01 Dec 2014

Analysis of TCP and DTN Retransmission Algorithms in Presence of Channel Disruptions
...
-
, et. al. ...
20 Jul 2009
20 Jul 2009

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Automating customized computing

Abstract

Talk to us

Similar Papers