FPGA-Based Hardware Matrix Inversion Architecture Using Hybrid Piecewise Polynomial Approximation Systolic Cells

Javier Vázquez-Castillo,Omar Longoria-Gandara,Jaime Ortegón-Aguilar,Roberto Carrasco-Alvarez,Alejandro Castillo-Atoche

doi:10.3390/electronics9010182

Javier Vázquez-Castillo, Omar Longoria-Gandara + Show 3 more

Open Access

https://doi.org/10.3390/electronics9010182

Copy DOI

Abstract

The hardware of the matrix inversion architecture using QR decomposition with Givens Rotations (GR) and a back substitution (BS) block is required for many signal processing algorithms. However, the hardware of the GR algorithm requires the implementation of complex operations, such as the reciprocal square root (RSR), which is typically implemented using LookUp Table (LUT) and COordinate Rotation DIgital Computer (CORDICs), among others, conveying to either high-area consumption or low throughput. This paper introduces an Field-Programmable Gate Array (FPGA)-based full matrix inversion architecture using hybrid piecewise polynomial approximation systolic cells. In the design, a hybrid segmentation technique was incorporated for the implementation of piecewise polynomial systolic cells. This hybrid approach is composed by an external and internal segmentation, where the first is nonuniform and the second is uniform, fitting the curve shape of the complex functions achieving a better signal-quantization-to noise-ratio; furthermore, it improves the time performance and area resources. Experimental results reveal a well-balanced improvement in the design achieving high throughput and, hence, less resource utilization in comparison to state-of-the-art FPGA-based architectures. In our study, the proposed design achieves 7.51 Mega-Matrices per second for performing 4 × 4 matrix operations with a latency of 12 clock cycles; meanwhile, the hardware design requires only 1474 slice registers, 1458 LUTs in an FPGA Virtex-5 XC5VLX220T, and 1474 slice registers and 1378 LUTs when a FPGA Virtex-6 XC6VLX240T is used.

Highlights

Matrix inversion is one of the most useful operations used in many signal processing algorithms (SPA), where the efficient computation and accuracy of this operation are required
The number of slice registers and slice lookup tables (LUT) are balanced with the DSP48E resources in order to allocate the designed architecture in the Field-Programmable Gate Array (FPGA)
The design using only slice registers requires more than three times the number of resources than the one utilizing both FPGA

Summary

Introduction

Matrix inversion is one of the most useful operations used in many signal processing algorithms (SPA), where the efficient computation and accuracy of this operation are required. The HW design of the standard GR requires the implementation of complex operations, such as square root (SR) and its reciprocal (reciprocal square root (RSR)). In this sense, recent studies reported in References [8,9,10,11]. CORDIC implementations have proved its efficiency for computing complex operations such as SR, RSR, number division, sine, and cosine, among others [12]. The accuracy of CORDIC implementation depends on the number of algorithm’s iterations [13]

Methods

Results

Conclusion