MInGLE

Cecilia González-Álvarez,Jennifer B Sartor,Carlos Álvarez,Lieven Eeckhout,Daniel Jiménez-González

doi:10.1145/2898356

Abstract

The end of Dennard scaling leads to new research directions that try to cope with the utilization wall in modern chips, such as the design of specialized architectures. Processor customization utilizes transistors more efficiently, optimizing not only for performance but also for power. However, hardware specialization for each application is costly and impractical due to time-to-market constraints. Domain-specific specialization is an alternative that can increase hardware reutilization across applications that share similar computations. This article explores the specialization of low-power processors with custom instructions (CIs) that run on a specialized functional unit. We are the first, to our knowledge, to design CIs for an application domain and across basic blocks, selecting CIs that maximize both performance and energy efficiency improvements. We present the Merged Instructions Generator for Large Efficiency (MInGLE), an automated framework that identifies and selects CIs. Our framework analyzes large sequences of code (across basic blocks) to maximize acceleration potential while also performing partial matching across applications to optimize for reuse of the specialized hardware. To do this, we convert the code into a new canonical representation, the Merging Diagram, which represents the code’s functionality instead of its structure. This is key to being able to find similarities across such large code sequences from different applications with different coding styles. Groups of potential CIs are clustered depending on their similarity score to effectively reduce the search space. Additionally, we create new CIs that cover not only whole-body loops but also fragments of the code to optimize hardware reutilization further. For a set of 11 applications from the media domain, our framework generates CIs that significantly improve the energy-delay product (EDP) and performance speedup. CIs with the highest utilization opportunities achieve an average EDP improvement of 3.8 × compared to a baseline processor modeled after an Intel Atom. We demonstrate that we can efficiently accelerate a domain with partially matched CIs, and that their design time, from identification to selection, stays within tractable bounds.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

MInGLE

Abstract

Talk to us

Similar Papers

More From: ACM Transactions on Architecture and Code Optimization

Lead the way for us

Journal: ACM Transactions on Architecture and Code Optimization	Publication Date: Jun 14, 2016
Citations: 28

Similar Papers

Synthesis of custom processors based on extensible platforms
Fei Sun ... Anand Raghunathan
-
Fei Sun, et. al.Fei Sun ... Anand Raghunathan
01 Jan 2002
01 Jan 2002

Custom-Instruction Synthesis for Extensible-Processor Platforms
F Sun ... S Ravi
IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems | VOL. 23
F Sun, et. al.F Sun ... S Ravi
01 Feb 2004
IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems | VOL. 23

Improving performance and energy efficiency of embedded processors via post-fabrication instruction set customization
Hamid Noori ... Koji Inoue
The Journal of Supercomputing | VOL. 60
Hamid Noori, et. al.Hamid Noori ... Koji Inoue
10 Nov 2010
The Journal of Supercomputing | VOL. 60

A Reconfigurable Functional Unit with Conditional Execution for Multi-Exit Custom Instructions
H Noori ... F Mehdipour
IEICE Transactions on Electronics | VOL. E91-C
H Noori, et. al.H Noori ... F Mehdipour
01 Apr 2008
IEICE Transactions on Electronics | VOL. E91-C

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

MInGLE

Abstract

Talk to us

Similar Papers

More From: ACM Transactions on Architecture and Code Optimization