Application specific processors and architectures are becoming increasingly important across all fields of computing from embedded to high-performance systems. These architectures and systems require efficient arithmetic algorithms, compilers, operating systems, and specialized applications to achieve optimum performance under strict constraints including size, power, throughput and operating frequency. This special double issue focuses on the latest developments and solutions concerning application specific processors and architectures from the hardware and software perspectives. This special issue consists of eleven papers related to the area of application-specific processors and architectures that are divided into three categories: compilers and operating systems, arithmetic algorithms, and application/algorithm specialization. The first paper in the compilers and operating systems category “Compact Code Generation for TightlyCoupled Processor Arrays”, by Boppu et al., presents methods for code compaction and generation for programmable tightly-coupled processor arrays consisting of interconnected small light-weight VLIW cores. The authors integrate these methods into a design tool and evaluate the results with benchmarks and compare the results to other existing compiler frameworks. The methods exploit compute-intensive nested loops, providing design entry in the form of a functional programming language and loop parallelization in the polyhedron model. The methods also support zero-overhead looping not only for the innermost loops but also for arbitrarily nested loops. The next paper, “Symbolic Mapping of Loop Programs onto Processor Arrays”, by Teich, et al., presents a solution to the problem of joint tiling and scheduling of a loop nest with uniform data dependencies symbolically. This challenge arises when the size and number of available processors for parallel loop execution is not known at compile time. The paper discusses a solution for deriving parameterized latency-optimal schedules statically by proposing a two step approach that determines two potential schedules. Once the size of the processor array becomes known at run time, simple comparisons of latency-determining expressions finally steer which of these schedules will be dynamically selected and the corresponding program configuration executed on the resulting processor array so to avoid any further run-time optimization or expensive recompilation. In the paper “Virtualized Execution and Management of Hardware Tasks on a Hybrid ARM-FPGA Platform”, by Jain et al., the authors focus on managing the execution of hardware tasks within a processor-based system, and in doing so, how to virtualize the resources to ensure isolation and predictability. The authors use a microkernel-based hypervisor running on a commercial hybrid (FPGA-based) computing platform. The hypervisor leverages the capabilities of the FPGA fabric, with support for discrete hardware accelerators, dynamically reconfigurable regions, and regions of virtual fabric. The authors study the communication overheads, quantify the context switch overhead of the hypervisor approach and compare with the idle time for a standard Linux implementation, showing two orders of magnitude improvement. The final compilers and operating systems paper, “ANovel Object-Oriented Software Cache for Scratchpad-BasedMultiCore Clusters”, by Pinto et al., presents a software cache implementation for an accelerator fabric, with special focus on object-oriented caching techniques aimed at reducing the global overhead introduced by the proposed software cache. The authors validate the software cache approach with a set of experiments, and three case studies of Object-Oriented software cache for computer vision applications. There are four papers related to application implementations in this special issue. The first paper, “Hardware Acceleration of M. C. Smith (*) Holcombe, Department of Electrical and Computer Engineering, Clemson University, Clemson, SC, USA e-mail: smithmc@clemson.edu
Read full abstract