Abstract
SummaryDigital down converter (DDC) is a time‐intensive and data‐intensive computing task and considered as the key technology in software defined radio. This paper proposes a high‐performance implementation of DDC on a graphics processing unit (GPU) using CUDA, which is composed of a numerically controlled oscillator stage, a cascaded integrator‐comb (CIC) decimation filter stage, and a finite impulse response (FIR) filter stage. The GPU implementation and optimizing of all the stages are studied in detail. Additionally, for handling a long‐duration signal, the signal data sequence is truncated into segments; the overlap‐save and overlap‐add mechanisms were applied in CIC stage and FIR stage, respectively. Finally, experiments were conducted to evaluate the performance of GPU‐based DDC with respect to a sequential version CPU implementation and an OpenMP implementation (16 threads). Experimental results demonstrate that the DDC achieves significant improvements on the GPU; the maximum speed ups in numerically controlled oscillator stage, CIC stage, and FIR stage can achieve more than 1242, 527, and 179 times, including data‐transfer, kernel execution, and other processing operations; the overall speed up of DDC can achieve more than 180. In the meantime, the speed ups of GPU implementation are far above the OpenMP implementation (about 2.5‐6.4 times).
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
More From: Concurrency and Computation: Practice and Experience
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.