Abstract

This article proposes a simplified offset min-sum (SOMS) decoding algorithm for the QC-LDPC codes. It is an implementation-friendly algorithm based on a new logarithmic-likelihood-ratio (LLR) grouping technique that alleviates the computational complexity of the QC-LDPC channel-decoder. This work also presents a parallel and hardware-efficient architecture of the QC-LDPC decoder based on the suggested SOMS algorithm. Additional architectural transformations have been carried out to reduce the routing complexity of the proposed decoder and deliver lower latency and higher throughput. Comprehensive performance analysis of the SOMS algorithm has been presented under various scenarios based on the specifications of the 5G-NR standard. It shows that the suggested SOMS algorithm delivers an adequate FER of 10 <inline-formula xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"><tex-math notation="LaTeX">$^{-5}$</tex-math></inline-formula> at SNR of 1.3 dB while decoding 16-QAM modulated QC-LDPC code with the code rate of 1/3 and the code length of 26112 bits. Subsequently, our QC-LDPC decoder has been hardware-implemented on the FPGA platform (Xilinx Zynq-Ultrascale+ board) that operates at the maximum clock frequency of 128.36 MHz. It can be reconfigured to support seven different 5G-NR code lengths and code rates that range between 10368 <inline-formula xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"><tex-math notation="LaTeX">$-$</tex-math></inline-formula> 26112 bits and 1/3 <inline-formula xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"><tex-math notation="LaTeX">$-$</tex-math></inline-formula> 8/9, respectively. This FPGA implementation of the proposed QC-LDPC decoder delivers a peak throughput of 13.3 Gbps and latency of 0.77 <inline-formula xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"><tex-math notation="LaTeX">$\mu$</tex-math></inline-formula> s while decoding with 10 iterations. On comparing this work with the reported implementations, the proposed decoder has shown 7.5× higher throughput and 34% better hardware efficiency than the state-of-the-art implementations.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call