Abstract

SUMMARYModern graphics cards provide computational capabilities that exceed current CPUs. As one of the computational intensive problems, numerical weather prediction has the opportunity to benefit from the massive number of threads and large memory throughput in the graphics architecture. In this paper, we present the key steps to integrate the Compute Unified Device Architecture (CUDA) programming framework for one key component in numerical weather prediction, the data assimilation algorithm, which incorporates the observational data into the model to produce the best initial condition in the next prediction. The data assimilation algorithm we studied in this paper exhibits good localization and favors parallelism. To maximize the throughput of the graphics card, over a million CUDA threads, global memory coalescing, and fast graphics shared memory are utilized. We also demonstrate the differences in the advancement of GPU architectures from the GTX 200 series to Fermi. The experiments are carried out separately on a GTX 260 (GTX 200 series) and a GTX 460 (Fermi) graphics card. Results show an improvement of 72.1× speedup running on the GTX 260 and 92.7×speedup on the GTX 460. The results provide attractive evidence for applying CUDA GPUs to high demanding scientific computation realms. Copyright © 2011 John Wiley & Sons, Ltd.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.