The retrieval of global significant wave height (SWH) data is crucial for maritime navigation, aquaculture safety, and oceanographic research. Leveraging the high temporal resolution and spatial coverage of Cyclone Global Navigation Satellite System (CYGNSS) data, machine learning models have shown promise in SWH retrieval. However, existing models struggle with accuracy under high-SWH conditions and discard a significant number of such observations due to low quality, which limits their effectiveness in global SWH retrieval, particularly for monitoring tropical cyclone (TC) events. To address this, this study proposes a daily global SWH retrieval framework through the enhanced eXtreme Gradient Boosting model (XGBoost-SC), which incorporates Cumulative Distribution Function (CDF) matching to introduce prior distribution information and reduce errors for SWH values exceeding 3 m. An enhanced loss function is employed to improve accuracy and mitigate the distribution bias in low-SWH retrieval induced by CDF matching. The results were tested over one million sample points and validated against the European Centre for Medium-Range Weather Forecasts (ECMWF) SWH product. With the help of CDF matching, XGBoost-SC outperformed all models, significantly reducing RMSE and bias while improving the retrieval capability for high SWHs. For SWH values between 3–6 m, the RMSE and bias were 0.94 m and −0.44 m, and for values above 6 m, they were 2.79 m and −2.0 m. The enhanced performance of XGBoost-SC for large SWHs was further confirmed in TC conditions over the Western North Pacific and in the Western Atlantic Ocean. This study provides a reference for large-scale SWH retrieval, particularly under TC conditions.
Read full abstract