Satellite-observed radiance is a nonlinear functional of surface properties and atmospheric temperature and absorbing gas profiles as described by the radiative transfer equation (RTE). In the era of hyperspectral sounders with thousands of high-resolution channels, the computation of the radiative transfer model becomes more time-consuming. The radiative transfer model performance in operational numerical weather prediction systems still limits the number of channels we can use in hyperspectral sounders to only a few hundreds. To take the full advantage of such high-resolution infrared observations, a computationally efficient radiative transfer model is needed to facilitate satellite data assimilation. In recent years the programmable commodity graphics processing unit (GPU) has evolved into a highly parallel, multi-threaded, many-core processor with tremendous computational speed and very high memory bandwidth. The radiative transfer model is very suitable for the GPU implementation to take advantage of the hardware’s efficiency and parallelism where radiances of many channels can be calculated in parallel in GPUs. In this paper, we develop a GPU-based high-performance radiative transfer model for the Infrared Atmospheric Sounding Interferometer (IASI) launched in 2006 onboard the first European meteorological polar-orbiting satellites, METOP-A. Each IASI spectrum has 8461 spectral channels. The IASI radiative transfer model consists of three modules. The first module for computing the regression predictors takes less than 0.004% of CPU time, while the second module for transmittance computation and the third module for radiance computation take approximately 92.5% and 7.5%, respectively. Our GPU-based IASI radiative transfer model is developed to run on a low-cost personal supercomputer with four GPUs with total 960 compute cores, delivering near 4 TFlops theoretical peak performance. By massively parallelizing the second and third modules, we reached 364× speedup for 1 GPU and 1455× speedup for all 4 GPUs, both with respect to the original CPU-based single-threaded Fortran code with the –O 2 compiling optimization. The significant 1455× speedup using a computer with four GPUs means that the proposed GPU-based high-performance forward model is able to compute one day’s amount of 1,296,000 IASI spectra within nearly 10 min, whereas the original single CPU-based version will impractically take more than 10 days. This model runs over 80% of the theoretical memory bandwidth with asynchronous data transfer. A novel CPU–GPU pipeline implementation of the IASI radiative transfer model is proposed. The GPU-based high-performance IASI radiative transfer model is suitable for the assimilation of the IASI radiance observations into the operational numerical weather forecast model.
Read full abstract