Abstract

Data gaps are a recurring challenge in climate research, hindering effective time series analysis and modeling. This study proposes a novel two-step data imputation framework to address temperature time series with a long continuous gap surrounded by predictor stations with sporadic missingness. The method leverages iterative gap-filling Singular Spectrum Analysis (SSA) for the small sporadic gaps, followed by multivariate techniques like Inverse Distance Weightage (IDW), Kriging, Spatial Regression Test (SRT), Point Estimation method of Biased Sentinel Hospital-based Area Disease Estimation (P-BSHADE), Random Forest (RF), Support Vector Machines (SVM), and MissForest (MF) for the longer gap. Once the sporadic gaps are effectively addressed with SSA, the method carefully applies multivariate techniques to impute the long continuous gap. Prioritizing accuracy, comprehensive cross-validation with class-based statistical indicators are employed to minimize any potential biases introduced by the imputation process. The study shows the effectiveness of SSA in filling small sporadic gaps using an optimal window length (M ≈ 365 days) and eigentriple grouping (ET = 30). Notably, for maximum temperature, P-BSHADE and SVM achieve an impressive accuracy (e.g., Legates's Coefficient of Efficiency (LCE), 0.75–0.44, Combined Performance Index (CPI), 6.3 % ~ 19.1 %) attributed to their ability to capture spatial and/or temporal heterogeneity. While SRT and P-BSHADE offers acceptable performance for minimum temperature (e.g., LCE, 0.51–0.27, CPI, 0.7 % ~ 23.7 %), the study also uncovers a complex interplay between missing data, predictor stations, and autocorrelation affecting imputation accuracy. This suggests that the reduced performance of certain techniques likely stems from the decline in spatial and spatiotemporal autocorrelation between the target station and its predictors. Overall, this study presents a promising framework for handling complex missing data scenarios often encountered in climate time series analysis, paving the way for more robust and reliable analysis and modeling.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.