Abstract
Two-dimensional finite impulse response (FIR) filters are an important component in many image and video processing systems. The processing of complex video applications in real time requires high computational power, which can be provided using field programmable gate arrays (FPGAs) due to their inherent parallelism. The most resource-intensive components in computing FIR filters are the multiplications of the folding operation. This work proposes two optimization techniques for high-speed implementations of the required multiplications with the least possible number of FPGA components. Both methods use integer linear programming formulations which can be optimally solved by standard solvers. In the first method, a formulation for the pipelined multiple constant multiplication problem is presented. In the second method, also multiplication structures based on look-up tables are taken into account. Due to the low coefficient word size in video processing filters of typically 8 to 12 bits, an optimal solution is found for most of the filters in the benchmark used. A complexity reduction of 8.5% for a Xilinx Virtex 6 FPGA could be achieved compared to state-of-the-art heuristics.
Highlights
Two-dimensional linear filters with finite impulse response (FIR) are one of the most fundamental operations used in image and video processing
While this is very demanding for a microprocessor or digital signal processor, the inherent parallelism of field programmable gate arrays (FPGAs) can be used to accelerate the FIR operation
The complete convolution matrices are given in Appendix 1; the filter parameters are summarized in Table 1 with their matrix size, word size, the required pipeline stages S using pipelined MCM (PMCM), parameters for their design, and the Nuq unique odd coefficients of their folding matrix
Summary
Two-dimensional linear filters with finite impulse response (FIR) are one of the most fundamental operations used in image and video processing. Compared to infinite impulse response filters, FIR filters have a strict stability, and highthroughput implementations are possible using pipelining as no recursions are involved They are computationally expensive as many multiply accumulate (MAC) operations are necessary for each pixel of the resulting image. In method (a), constant multiplications are realized using additions, subtractions, and bit shifts only These operations form a so-called adder graph, so this method is called the adder graph MCM method in the following. Due to the relatively large routing delays compared to the fast carry chain, a pipelined implementation of the adder graph is necessary to obtain the maximum speed of the FPGA [2,5,6,7,8,9,10] It was shown by Faust et al [35] that the LUT-based approach (method b) is competitive to the adder graph method. Results from the optimizations and FPGA synthesis are presented and discussed, followed by a conclusion
Published Version (Free)
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have