Abstract

SummaryDigital down converter (DDC) is a time‐intensive and data‐intensive computing task and considered as the key technology in software defined radio. This paper proposes a high‐performance implementation of DDC on a graphics processing unit (GPU) using CUDA, which is composed of a numerically controlled oscillator stage, a cascaded integrator‐comb (CIC) decimation filter stage, and a finite impulse response (FIR) filter stage. The GPU implementation and optimizing of all the stages are studied in detail. Additionally, for handling a long‐duration signal, the signal data sequence is truncated into segments; the overlap‐save and overlap‐add mechanisms were applied in CIC stage and FIR stage, respectively. Finally, experiments were conducted to evaluate the performance of GPU‐based DDC with respect to a sequential version CPU implementation and an OpenMP implementation (16 threads). Experimental results demonstrate that the DDC achieves significant improvements on the GPU; the maximum speed ups in numerically controlled oscillator stage, CIC stage, and FIR stage can achieve more than 1242, 527, and 179 times, including data‐transfer, kernel execution, and other processing operations; the overall speed up of DDC can achieve more than 180. In the meantime, the speed ups of GPU implementation are far above the OpenMP implementation (about 2.5‐6.4 times).

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call