Parallel LDPC decoding using CUDA and OpenMP

Joo-Yul Park,Ki-Seok Chung

doi:10.1186/1687-1499-2011-172

Abstract

Digital mobile communication technologies, such as next generation mobile communication and mobile TV, are rapidly advancing. Hardware designs to provide baseband processing of new protocol standards are being actively attempted, because of concurrently emerging multiple standards and diverse needs on device functions, hardware-only implementation may have reached a limit. To overcome this challenge, digital communication system designs are adopting software solutions that use central processing units or graphics processing units (GPUs) to implement communication protocols. In this article we propose a parallel software implementation of low density parity check decoding algorithms, and we use a multi-core processor and a GPU to achieve both flexibility and high performance. Specifically, we use OpenMP for parallelizing software on a multi-core processor and Compute Unified Device Architecture (CUDA) for parallel software running on a GPU. We process information on H-matrices using OpenMP pragmas on a multi-core processor and execute decoding algorithms in parallel using CUDA on a GPU. We evaluated the performance of the proposed implementation with respect to two different code rates for the China Multimedia Mobile Broadcasting (CMMB) standard, and we verified that the proposed implementation satisfies the CMMB bandwidth requirement.

Highlights

Today, wireless devices transmit and receive high rate data in real-time
Software implementations of communication protocols using central processing unit (CPU) or graphics processing units (GPUs) are rapidly being adopted in digital communication system designs
We have described a software design that implements parallel processing of low density parity check (LDPC) decoding algorithms

Summary

Introduction

Wireless devices transmit and receive high rate data in real-time. The need to provide high transmission rates with reliability is increasing, in order to offer various multimedia services with 4G mobile communication systems. It is very challenging to design decoder hardware that supports various standards and multiple data rates. A related study proposed a method for LDPC decoding using Compute Unified Device Architecture (CUDA) [13]. They showed that a GPU could reduce decoding time dramatically. We extend this parallelization further in such a way that various standards and code rates can be supported seamlessly. To support various code rates, the host multi-core CPU reads the H-matrix, and, using OpenMP, it generates address patterns which help the GPU to effectively execute the LDPC decoding in parallel.

Background

H Matrix File Read

Conclusion