Abstract
Using a specific input-restructuring sequence, a new VLSI algorithm and architecture have been derived for a high throughput memory-based systolic array VLSI implementation of a discrete cosine transform. The proposed restructuring technique transforms the DCT algorithm into a cycle-convolution and a pseudo-cycle convolution structure as basic computational forms. The proposed solution has been specially designed to have good fixed-point error performances that have been exploited to further reduce the hardware complexity and power consumption. It leads to a ROM based VLSI kernel with good quantization properties. A parallel VLSI algorithm and architecture with a good fixed point implementation appropriate for a memory-based implementation have been obtained. The proposed algorithm can bemapped onto two linear systolic arrays with similar length and form. They can be further efficientlymerged into a single array using an appropriate hardware sharing technique. A highly efficient VLSI chip can be thus obtained with appealing features as good architectural topology, processing speed, hardware complexity and I/O costs. Moreover, the proposed solution substantially reduces the hardware overhead involved by the pre-processing stage that for short length DCT consumes an important percentage of the chip area.
Highlights
The discrete cosine transform (DCT) and discrete sine transform (DST) [1,2,3] are key elements in many digital signal processing applications being good approximations to the statistically optimal Karhunen-Loeve transform [2, 3]
The proposed algorithm and its associated VLSI architecture have good numerical properties that can be efficiently exploited to lead to a low-complexity hardware implementation with low power consumption. It uses a cycle and a pseudo-cycle convolution structure that can be efficiently mapped on two linear systolic arrays having the same form and length and using a small number of I/O channels placed at the two extreme ends of the array
A new memory-based design approach that leads to a reduced hardware complexity and a highthroughput VLSI implementation based on a new reformulation of DCT having good quantization properties is presented
Summary
The discrete cosine transform (DCT) and discrete sine transform (DST) [1,2,3] are key elements in many digital signal processing applications being good approximations to the statistically optimal Karhunen-Loeve transform [2, 3]. This is one explanation why regular computational structures such as cyclic convolution and circular correlation have been used to obtain efficient VLSI implementations [17,18,19] using modular and regular architectural paradigm as distributed arithmetic [20] or systolic arrays [21] This approach leads to low I/O cost and reduced hardware complexity, high speed and a regular and modular hardware structure. The proposed algorithm and its associated VLSI architecture have good numerical properties that can be efficiently exploited to lead to a low-complexity hardware implementation with low power consumption It uses a cycle and a pseudo-cycle convolution structure that can be efficiently mapped on two linear systolic arrays having the same form and length and using a small number of I/O channels placed at the two extreme ends of the array.
Published Version (
Free)
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have