Parallel Decoding Research Articles

In wireless communication schemes, turbo codes facilitate near-capacity transmission throughputs by achieving reliable forward error correction. However, owing to the serial data dependencies imposed by the underlying logarithmic Bahl–Cocke-Jelinek–Raviv (Log-BCJR) algorithm, the limited processing throughputs of conventional turbo decoder implementations impose a severe bottleneck upon the overall throughputs of real-time wireless communication schemes. Motivated by this, we recently proposed a fully parallel turbo decoder (FPTD) algorithm, which eliminates these serial data dependencies, allowing parallel processing and hence offering a significantly higher processing throughput. In this paper, we propose a novel resource-efficient version of the FPTD algorithm, which reduces its computational resource requirement by 50%, which enhancing its suitability for field-programmable gate array (FPGA) implementations. We propose a model FPGA implementation. When using a Stratix IV FPGA, the proposed FPTD FPGA implementation achieves an average throughput of 1.53 Gb/s and an average latency of 0.56 $\mu \text{s}$ , when decoding frames comprising ${N}=720$ b. These are, respectively, 13.2 times and 11.1 times superior to those of the state-of-the-art FPGA implementation of the Log-BCJR long-term evolution (LTE) turbo decoder, when decoding frames of the same frame length at the same error correction capability. Furthermore, our proposed FPTD FPGA implementation achieves a normalized resource usage of 0.42 (kALUTs/Mb/s), which is 5.2 times superior to that of the benchmarker decoder. Furthermore, when decoding the shortest $N=40$ -b LTE frames, the proposed FPTD FPGA implementation achieves an average throughput of 442 Mb/s and an average latency of 0.18 $\mu \text{s}$ , which are, respectively, 21.1 times and 10.6 times superior to those of the benchmarker decoder. In this case, the normalized resource usage of 0.08 (kALUTs/Mb/s) is 146.4 times superior to that of the benchmarker decoder.

Read full abstract

Turbo codes facilitate near-capacity transmission throughputs by achieving a reliable iterative forward error correction. However, owing to the serial data dependence imposed by the logarithmic Bahl–Cocke–Jelinek–Raviv algorithm, the limited processing throughputs of the conventional turbo decoder implementations impose a severe bottleneck upon the overall throughputs of real-time communication schemes. Motivated by this, we recently proposed a floating-point fully parallel turbo decoder (FPTD) algorithm, which eliminates the serial data dependence, allowing parallel processing and hence significantly reducing the number of clock cycles required. In this paper, we conceive a technique for reducing the critical datapath of the FPTD, and we propose a novel fixed-point version as well as its very large scale integration (VLSI) implementation. We also propose a novel technique, which allows the FPTD to also decode shorter frames employing compatible interleaver patterns. We strike beneficial tradeoffs amongst the latency, core area, and energy consumption by investigating the minimum bit widths and techniques for message log-likelihood ratio scaling and state metric normalization. Accordingly, the design flow and design tradeoffs considered in this paper are also applicable to other fixed-point implementations of error correction decoders. We demonstrate that upon using Taiwan Semiconductor Manufacturing Company (TSMC) 65-nm low-power technology for decoding the longest long-term evolution frames (6144 b) received over an additive white Gaussian noise channel having $E_{b}/N_{0}=1~ \text {dB}$ , the proposed fixed-point FPTD VLSI achieves a processing throughput of 21.9 Gb/s and a processing latency of $0.28~ \mu \text {s}$ . These results are 17.1 times superior to those of the state-of-the-art benchmarker. Furthermore, the proposed fixed-point FPTD VLSI achieves an energy consumption of $2.69~ \mu \text {J}$ /frame and a normalized core area of $5~ \text {mm}^{\vphantom {R^{'}}2}/\text {Gb/s}$ , which are 34% and 23% lower than those of the benchmarker, respectively.

Read full abstract

Parallel Decoding Research Articles

Related Topics

Articles published on Parallel Decoding

Stochastic resonance in parallel concatenated turbo code decoding

Single Multiscale-Symbol Error Correction Codes for Multiscale Storage Systems

Efficient Non-Recursive Design of Second-Order Spectral-Null Codes

Data Detection Algorithms for BICM Alternate-Relaying Cooperative Systems With Multiple-Antenna Destination

An Efficient Single and Double-Adjacent Error Correcting Parallel Decoder for the (24,12) Extended Golay Code

Exploiting Thread-Level Parallelism on HEVC by Employing a Reference Dependency Graph

MLC 낸드 플래시 메모리 오류정정을 위한 고속 병렬 BCH 복호기 설계

On the Development and Optimization of HEVC Video Decoders Using High-Level Dataflow Modeling

High-Speed Parallel Decodable Nonbinary Single-Error Correcting (SEC) Codes

다중 채널 부호를 이용한 FTN 전송 시스템

Low-Rank Methods for Parallelizing Dynamic Programming Algorithms

Design of QPP Interleavers for the Parallel Turbo Decoding Architecture

1.5 Gbit/s FPGA Implementation of a Fully-Parallel Turbo Decoder Designed for Mission-Critical Machine-Type Communication Applications

VLSI Implementation of Fully Parallel LTE Turbo Decoders

Implementation of a Fully-Parallel Turbo Decoder on a General-Purpose Graphics Processing Unit

Insertion Algorithms with Justification for Solving the Resource-Constrained Project Scheduling Problem

Parallel decoding for lattice reduction-aided MIMO Receiver

4K Real-Time and Parallel Software Video Decoder for Multilayer HEVC Extensions

Decoding the view expectation during learned maze navigation from human fronto-parietal network.

Parallel Decodable Two-Level Unequal Burst Error Correcting Codes

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Parallel Decoding Research Articles

Related Topics

Articles published on Parallel Decoding

Stochastic resonance in parallel concatenated turbo code decoding

Single Multiscale-Symbol Error Correction Codes for Multiscale Storage Systems

Efficient Non-Recursive Design of Second-Order Spectral-Null Codes

Data Detection Algorithms for BICM Alternate-Relaying Cooperative Systems With Multiple-Antenna Destination

An Efficient Single and Double-Adjacent Error Correcting Parallel Decoder for the (24,12) Extended Golay Code

Exploiting Thread-Level Parallelism on HEVC by Employing a Reference Dependency Graph

MLC 낸드 플래시 메모리 오류정정을 위한 고속 병렬 BCH 복호기 설계

On the Development and Optimization of HEVC Video Decoders Using High-Level Dataflow Modeling

High-Speed Parallel Decodable Nonbinary Single-Error Correcting (SEC) Codes

다중 채널 부호를 이용한 FTN 전송 시스템

Low-Rank Methods for Parallelizing Dynamic Programming Algorithms

Design of QPP Interleavers for the Parallel Turbo Decoding Architecture

1.5 Gbit/s FPGA Implementation of a Fully-Parallel Turbo Decoder Designed for Mission-Critical Machine-Type Communication Applications

VLSI Implementation of Fully Parallel LTE Turbo Decoders

Implementation of a Fully-Parallel Turbo Decoder on a General-Purpose Graphics Processing Unit

Insertion Algorithms with Justification for Solving the Resource-Constrained Project Scheduling Problem

Parallel decoding for lattice reduction-aided MIMO Receiver

4K Real-Time and Parallel Software Video Decoder for Multilayer HEVC Extensions

Decoding the view expectation during learned maze navigation from human fronto-parietal network.

Parallel Decodable Two-Level Unequal Burst Error Correcting Codes