Abstract

The processing speed of radar echo coherent accumulation system is an important factor affecting the real-time performance of space target detection. In this paper, based on GPU V100, adopting the concept of half-precision and tensor core, we design the radar echo coherent accumulation system and achieve the acceleration effect. The design of the system includes optimizing the process of coherent accumulation system, designing the scaling coefficient and using tcFFT library to realize FFT with the method of WMMA. We use FP32, FPl6 and FP16tensor core to compare the speed of coherent accumulation system. In FP32 and FP16, we use CUFFT library to realize FFT operation, and in FP16tensor core, we call tcFFT library to realize FFT operation. Nsight Compute is used to test the speed. The test results show that: (a) The time of creating FFT plan in tcFFT is less than CUFFT. (b) In the case of single batch, FP16 achieves 1.18X-1.39X acceleration effect compared with FP32 in the whole coherent accumulation process; In the case of multiple batches, the parallel batch processing method is proposed, and in two-dimensional FFT, compared with FP16, FP16tensor core can achieve 2.23X-3.17X acceleration effect, in the whole phase-coherent accumulation process, it can achieve 1.54X-1.77X acceleration effect.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.