Latest Advances on Design and Implementation of DSP Systems

Warren J Gross ,Zhiyuan Yan

doi:10.1007/s11265-014-0930-z

Abstract

New challenges for implementations and design methodologies have been introduced over the years and the trend seems to continue. More and more complex algorithms need to be implementedwith higher performance but with strict constraints on power consumption. There is need for improving the implementation techniques, design methodologies, and algorithmarchitecture optimizations to tackle the challenges in present signal and information processing systems. This special issue contains a selection of papers reporting the latest advances on design and implementation of signal processing systems. The topics span from circuit level architectures in memories and arithmetic to dynamic code generation and from application implementations on mobile devices or heterogeneous platforms. In their paper Improving the Reliability of MLC NAND Flash Memories through Adaptive Data Refresh and Error Control Coding, Yang, Chen, Mudge, and Chakrabarti propose a combination of data refresh policies and low cost error control coding (ECC) schemes to address the errors in multilevel cell (MLC) NAND Flash memories given application characteristics. It is shown that an appropriate choice of refresh interval and BCH based ECC scheme can minimize memory energy while satisfying the reliability constraint. Hardware implementation of associative memories based on message passing algorithms on sparse graphs is described in Algorithm and Architecture of Fully-Parallel Associative Memories Based on Sparse Clustered Networks by Jarollahi, Onizawa, Gripon, and Gross. Architectures are derived that eliminates the need for computationally-complex winner-take all circuits. This results in improvement in clock frequency by about a factor of 2 and a reduction in circuit size. A design space exploration is provided and hardware complexity of the FPGA implementations are described. Antao and Sousa propose an implementation of an arithmetic accelerator for modular arithmetic based on the residue number system (RNS) in their paper A Flexible Architecture for Modular Arithmetic Hardware Accelerators based on RNS. An architecture of processing elements connected as a ring is proposed. The architecture is fully-parallel and is scalable to different algorithms and operand sizes. Implementations on FPGAs are provided. A Real-time Scalable Object Detection System using LowPower HOG Accelerator VLSI by Takagi, Tanaka, Izumi, Kawaguchi, and Yoshimoto proposes a real-time object detection system using a Histogram of Oriented Gradients (HOG) feature extraction accelerator and reconfigurable multiplyaccumulate array for supporting processing objects of different shapes. The proposed approach uses support vector machine for early classification. The system has been implemented on a 65 nmCMOS technology and the characteristics of the chip are reported in the paper. Blake and Hunter propose dynamically generated code for fast Fourier transforms. The fastest Fourier Transform in the South (FFTS) is a discrete Fourier transform library for x86 and ARM based devices, which was shown to be faster than FFTW, Intel IPP and Apple vDSP partly due to the use of program specialization and dynamic code generation. In this work, FFTS has been modified to dynamically exploit streaming store instructions on x86 machines, resulting in speedups of over 10 %, Also, when dynamic code generation is prohibited on some mobile platforms, FFTS has been altered to avoid it, while maximizing the performance. In paper Computer Vision Accelerators for Mobile Systems based on OpenCL GPGPU Co-Processing, Wang, Xiong, W. J. Gross (*) Department of Electrical and Computer Engineering, McGill University, Montreal, Quebec, Canada e-mail: warren.gross@mcgill.ca

Full Text