Implementation of a Fully-Parallel Turbo Decoder on a General-Purpose Graphics Processing Unit

An Li,Robert G Maunder,Lajos Hanzo,Bashir M Al-Hashimi

doi:10.1109/access.2016.2586309

An Li, Robert G Maunder + Show 2 more

Open Access

https://doi.org/10.1109/access.2016.2586309

Copy DOI

Journal: IEEE Access	Publication Date: Jan 1, 2016
Citations: 48	License type: CC BY 3.0

Affiliation: University of Southampton

Abstract

Turbo codes comprising a parallel concatenation of upper and lower convolutional codes are widely employed in the state-of-the-art wireless communication standards, since they facilitate transmission throughputs that closely approach the channel capacity. However, this necessitates high processing throughputs in order for the turbo code to support real-time communications. In the state-of-the-art turbo code implementations, the processing throughput is typically limited by the data dependences that occur within the forward and backward recursions of the Log-BCJR algorithm, which is employed during turbo decoding. In contrast to the highly serial Log-BCJR turbo decoder, we have recently proposed a novel fully parallel turbo decoder (FPTD) algorithm, which can eliminate the data dependences and perform fully parallel processing. In this paper, we propose an optimized FPTD algorithm, which reformulates the operation of the FPTD algorithm so that the upper and lower decoders have identical operation, in order to support single instruction multiple data operation. This allows us to develop a novel general purpose graphics processing unit (GPGPU) implementation of the FPTD, which has application in software-defined radios and virtualized cloud-radio access networks. As a benefit of its higher degree of parallelism, we show that our FPTD improves the higher processing throughput of the Log-BCJR turbo decoder by between 2.3 and 9.2 times, when employing a high-specification GPGPU. However, this is achieved at the cost of a moderate increase of the overall complexity by between 1.7 and 3.3 times.

Highlights

Channel coding has become an essential component in wireless communications, since it is capable of correcting the transmission errors that occur when communicating over noisy channels
1) We propose a beneficial enhancement of the Fully Parallel Turbo Decoder (FPTD) algorithm of [22] so that it supports Single Instruction Multiple Data (SIMD) operation and it becoming better suited for implementation on a General Purpose Graphics Processing Unit (GPGPU)
4) We show that when used for implementing the LTE turbo decoder, the proposed SIMD FPTD achieves a degree of parallelism that is between 4 and 24 times higher, representing a processing throughput improvement between 2.3 to 9.2 times as well as a latency reduction between 2 to 8.2 times

Summary

INTRODUCTION

Channel coding has become an essential component in wireless communications, since it is capable of correcting the transmission errors that occur when communicating over noisy channels. We previously proposed a Fully-Parallel Turbo Decoder (FPTD) algorithm [22], which dispenses with the serial data dependencies of the conventional Log-BCJR turbo decoder algorithm This enables every bit in a frame to be processed concurrently, achieving a much higher degree of parallelism than the previously demonstrated in the literature. 4) We show that when used for implementing the LTE turbo decoder, the proposed SIMD FPTD achieves a degree of parallelism that is between 4 and 24 times higher, representing a processing throughput improvement between 2.3 to 9.2 times as well as a latency reduction between 2 to 8.2 times This is achieved at the cost of increasing the overall complexity by a factor between 1.7 and 3.3.

GPU COMPUTING AND IMPLEMENTATIONS

Operation of the proposed SIMD FPTD algorithm

Mapping the SIMD FPTD algorithm onto a GPGPU

Data arrangement and memory allocation

Pseudo code

RESULTS

BER performance

Degree of parallelism

Processing latency

Processing throughput

Complexity

CONCLUSIONS

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Implementation of a Fully-Parallel Turbo Decoder on a General-Purpose Graphics Processing Unit

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: IEEE Access

Lead the way for us

Similar Papers

Comparison of Baseband Processors in Terms of Realization SDR-Transceivers
Maksym Serhiyovych Holub
Electronic and Acoustic Engineering | VOL. 3
Maksym Serhiyovych HolubMaksym Serhiyovych Holub
30 Jun 2020
Electronic and Acoustic Engineering | VOL. 3

Adaptive signal processing for multichannel sound using high performance computing
Jorge Lorente Giner
-
Jorge Lorente GinerJorge Lorente Giner
02 Dec 2015
02 Dec 2015

Implementation of a High Throughput 3GPP Turbo Decoder on GPU
Michael Wu ... Joseph R Cavallaro
Journal of Signal Processing Systems | VOL. 65
Michael Wu, et. al.Michael Wu ... Joseph R Cavallaro
10 Sep 2011
Journal of Signal Processing Systems | VOL. 65

Performance analysis of Turbo decoder using Soft Output Viterbi Algorithm
Shweta Ramteke ... Sandeep Kakde
-
Shweta Ramteke, et. al.Shweta Ramteke ... Sandeep Kakde
01 Apr 2015
01 Apr 2015

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Implementation of a Fully-Parallel Turbo Decoder on a General-Purpose Graphics Processing Unit

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: IEEE Access