Precipitation nowcasting, which involves the short-term, high-resolution prediction of rainfall, plays a crucial role in various real-world applications. In recent years, researchers have increasingly utilized deep learning-based methods in precipitation nowcasting. The exponential growth of spatiotemporal observation data has heightened interest in recent advancements such as denoising diffusion models, which offer appealing prospects due to their inherent probabilistic nature that aligns well with the complexities of weather forecasting. Successful application of diffusion models in rainfall prediction tasks requires relevant conditions and effective utilization to direct the forecasting process of the diffusion model. In this paper, we propose a probabilistic spatiotemporal model for precipitation nowcasting, named LLMDiff. The architecture of LLMDiff includes two networks: a conditional encoder-decoder network and a denoising network. The conditional network provides conditional information to guide the denoising network for high-quality predictions related to real-world earth systems. Additionally, we utilize a frozen transformer block from pre-trained large language models (LLMs) in the denoising network as a universal visual encoder layer, which enables the accurate estimation of motion trend by considering long-term temporal context information and capturing temporal dependencies within the frame sequence. Our experimental results demonstrate that LLMDiff outperforms state-of-the-art models on the SEVIR dataset.
Read full abstract