Abstract

The design of high-performance application-specific multi-core processor systems still is a time consuming task which involves many manual steps and decisions that need to be performed by experienced design engineers. The ASAM project sought to change this by proposing an automatic architecture synthesis and mapping flow aimed at the design of such application specific instruction-set processor (ASIP) systems. The ASAM flow separated the design problem into two cooperating exploration levels, known as the macro-level and micro-level exploration. This paper presents an overview of the micro-level exploration level, which is concerned with the analysis and design of individual processors within the overall multi-core design starting at the initial exploration stages but continuing up to the selection of the final design of the individual processors within the system. The designed processors use a combination of very-long instruction-word (VLIW), single-instruction multiple-data (SIMD), and complex custom DSP-like operations in order to provide an area- and energy-efficient and high-performance execution of the program parts assigned to the processor node.In this paper we present an overview of how the micro-level design space exploration interacts with the macro-level, how early performance estimates are used within the ASAM flow to determine the tasks executed by each processor node, and how an initial processor design is then proposed and refined into a highly specialized VLIW ASIP. The micro-level architecture exploration is then demonstrated with a walk-through description of the process on an example program kernel to further clarify the exploration and architecture specialization process.The main findings of the experimental research are that the presented method enables an automatic instruction-set architecture synthesis for VLIW ASIPs within a reasonable exploration time. Using the presented approach, we were able to automatically determine an initial architecture prototype that was able to meet the temporal performance requirements of the target application. Subsequently, refinement of this architecture considerably reduced both the design area (by 4x) and the active energy consumption (by 2x).

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call