Parallel Implementation of K-Best Quadrature Amplitude Modulation Detection for Massive Multiple Input Multiple Output Systems

Bhargav Gokalgandhi,Jonathan Ling,Zoran Latinović,Dragan Samardzija,Ivan Seskar

doi:10.3390/electronics13142775

Abstract

Massive MIMO (Multiple Input Multiple Output) systems impose significant processing burdens along with strict latency requirements. The combination of large-scale antenna arrays and wide bandwidth requirements for next-generation wireless systems creates an exponential increase in frontend to backend data. Balancing the processing latency and reliability is critical for baseband processing tasks such as QAM detection. While linear detection algorithms have low computational complexity, their use in Massive MIMO scenario has heavy degradation in error performance. Nonlinear detection methods such as Maximum Likelihood and Sphere Decoding have good error performance, but they suffer from high, variable, and uncontrollable computational complexity. For such cases, the K-best QAM detection algorithm can provide required control over the system performance while maintaining near-ML error performance. In this paper, hard-output, as well as soft-output K-best QAM detection, is implemented in a CPU by utilizing the multiple cores combined with vector processing. Similarly, hard-output detection in a GPU is implemented by leveraging the SIMD (Single Instruction, Multiple Data) architecture and Warp-based execution model. The processing time per bit and the energy consumption per bit are compared for CPU and GPU implementations for QAM constellation density and MIMO array size. The GPU implementation shows up to 5× processing latency per bit improvement and up to 120× energy consumption per bit improvement over the CPU implementation for typical QAM constellations such as 4, 16, and 64 QAM. GPU implementation also shows up to 125× improvement over CPU implementation in energy consumption per bit for larger MIMO configurations such as 24 × 24 and 32 × 32. Finally, the soft-output detector is combined with a LDPC (Low-Density Parity Check) decoder to obtain the FER (Frame Error Rate) performance for CPU implementation. The FER is then combined with frame processing latency to form a Goodput metric to demonstrate the latency and reliability tradeoff.

Full Text