Abstract

Objective:Efficient and flexible data integration platforms are important to apply various geoscience data for deep learning applications. Recently, deep learning techniques have been applied to analyze and predict natural phenomena in Earth sciences using geoscience data. Because geoscience data are considered as “big data”, data driven approaches such as deep learning are promising tools for understanding natural phenomena. In this paper, we propose a Geoscience Data Integration Platform (GeoDIP) for managing big geoscience data based on High-performance Computing (HPC) cluster systems. Methodology:GeoDIP provides data pre-processing and analyzes modules according to user defined configurations when creating Artificial Intelligence (AI) ready datasets. To determine the application of GeoDIP, we demonstrated precipitation prediction performance using multiple datasets. We collected three datasets from different sources: two from satellite observations and one from reanalysis data for weather forecasting. We then compared the results obtained for each dataset and the results obtained for integrated datasets. Results:The results confirmed that the integrated dataset generated from GeoDIP provided 9% improved prediction performance for F1-score over 2 h. This suggests that various types of information on atmospheric conditions explaining precipitation genesis from multiple data sources are crucial for precipitation prediction. For understanding model performance, we conducted a permutation-based feature importance test, which confirmed that the upper level information is important over time. In addition, we evaluated the performance of GeoDIP by comparing sequential and parallel tasks and obtained a performance improvement of approximately 97%. Conclusions:The proposed geoDIP facilitates the utilization of multiple datasets to analyze geophysical phenomena using parallel processes of HPC clusters with reduced computational time for data pre-processing and analysis.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call