Abstract

Using the proposed factorizations of discrete cosine transform (DCT) matrices, fast and recursive algorithms are stated. In this paper, signal flow graphs for the n-point DCT II and DCT IV algorithms are introduced. The proposed algorithms yield exactly the same results as with standard DCT algorithms but are faster. The arithmetic complexity and stability of the algorithms are explored, and improvements of these algorithms are compared with previously existing fast and stable DCT algorithms. A parallel hardware computing architecture for the DCT II algorithm is proposed. The computing architecture is first designed, simulated, and prototyped using a 40-nm Xilinx Virtex-6 FPGA and thereafter mapped to custom integrated circuit technology using 0.18- $$\upmu $$ m CMOS standard cells from Austria Micro Systems. The performance trade-off exists between computational precision, chip area, clock speed, and power consumption. This trade-off is explored in both FPGA and custom CMOS implementation spaces. An example FPGA implementation operates at clock frequencies in excess of 230 MHz for several values of system word size leading to real-time throughput levels better than 230 million 16-point DCTs per second. Custom CMOS-based results are subject to synthesis and place-and-route steps of the design flow. Physical silicon fabrication was not conducted due to prohibitive cost.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call