The Mekong River Basin (MRB) is crucial for the livelihoods of over 60 million people across six Southeast Asian countries. Understanding long-term sediment changes is crucial for management and contingency plans, but the sediment concentration data in the MRB are extremely sporadic, making analysis challenging. This study focuses on reconstructing long-term suspended sediment concentration (SSC) data using a novel semi-supervised machine learning (ML) model. The key idea of this approach is to exploit abundant available hydroclimate data to reduce training overfitting rather than solely relying on sediment concentration data, thus enhancing the accuracy of the employed ML models. Extensive experiments on daily hydroclimate and SSC data obtained from 1979 to 2019 at the three main stations (i.e., Chiang Saen, Nong Khai, and Mukdahan) are conducted to demonstrate the superior performance of the proposed method compared to the state-of-the-art supervised techniques (i.e., Random Forest, XGBoost, CatBoost, MLP, CNN, and LSTM), and surpasses existing semi-supervised methods (i.e., CoReg, ⊓ Model, ICT, and Mean Teacher). This approach is the first semi-supervised method to reconstruct sediment data in the field and has the potential for broader application in other river systems.
Read full abstract