A Low-Latency Low-Power QR-Decomposition ASIC Implementation in 0.13 $\mu{\rm m}$ CMOS

Mahdi Shabany,Dimpesh Patel,P Glenn Gulak

doi:10.1109/tcsi.2012.2215775

Abstract

This paper presents a hybrid QR decomposition (QRD) design that reduces the number of computations and increases their execution parallelism by using a unique combination of Multi-dimensional Givens rotations, Householder transformations and conventional 2-D Givens rotations. A semi-pipelined semi-iterative architecture is presented for the QRD core, that uses innovative design ideas to develop 2-D, Householder 3-D and 4-D/2-D configurable CORDIC processors, such that they can perform the maximum possible number of vectoring and rotation operations within the given number of cycles, while minimizing gate count and maximizing the resource utilization. Test results for the 0.3 <formula formulatype="inline" xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"><tex Notation="TeX">${\rm mm}^{2}$</tex></formula> QRD chip, fabricated in 0.13 <formula formulatype="inline" xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"> <tex Notation="TeX">$\mu{\rm m}$</tex></formula> 1P8M CMOS technology, demonstrate that the proposed design for 4 <formula formulatype="inline" xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"><tex Notation="TeX">$\,\times\,$</tex> </formula> 4 complex matrices attains the lowest reported processing latency of 40 clock cycles (144 ns) at 278 MHz and dissipates 48.2 mW at 1.3 V supply and 25 <formula formulatype="inline" xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"><tex Notation="TeX">$^{\circ}{\rm C}$</tex> </formula> . It outperforms all of the previously published QRD designs by offering the highest QR processing efficiency.

Full Text