AbstractLong‐term eddy covariance (EC) data are crucial for understanding the impact of global change on ecosystem functions. However, EC data often contain long gaps, particularly in tropical dry forests (TDF) due to seasonality and El Niño‐Southern Oscillation (ENSO) phases. These factors create high variability, complex dependencies, and dynamic flux footprints. No current gap‐filling method adequately addresses long gaps in TDFs. This study introduces a novel framework for addressing this issue by (a) defining gap sizes by their relative percentages, (b) training, tuning, and evaluating two machine learning (ML) models: MissForest for short gaps and Prophet for intermediate and long gaps, and (c) predicting half‐hourly EC data from 2013 to 2022 for six EC variables, where actual gap data sets ranged from 26.6% to 28.4%, at TDF in Costa Rica. Results indicate that MissForest excelled at filling short gaps (≤5%, R2 = 0.76 and Nash‐Sutcliffe efficiency (NSE) = 0.71), while Prophet performed exceptionally well for gaps between 5% and 10% (R2 = 0.72 and NSE = 0.67). However, both models struggled with gaps between 10% and 13%. Validation showed R2 values of 0.79, 0.88, and 0.77 for CO₂ flux, sensible heat flux, and latent heat flux, respectively, with corresponding NSE values of 0.78, 0.86, and 0.72, and normalized root mean squared error (NRMSE) around 2E‐4. Additionally, to validate our results, we applied our approach at three EC sites with different ecological conditions, demonstrating robust performance. This study presents a reliable ML approach for imputing long gaps in EC data, which can be applied to sites with strong variability.
Read full abstract