Predicting the dissolved oxygen concentration and identifying its driving factors are essential for improved prevention and management of anoxia in estuaries. However, complex hydrodynamic conditions and the limitations in traditional methods result in challenges in the identification of the driving factors for the low dissolved oxygen (DO) phenomenon. The objective of our study is to develop a robust deep learning model using four-year in situ data collected from an automatic water quality monitoring station (AWQMS) in an estuary, for accurate identification and quantification of the driving factors influencing DO levels. Mitigations in hypoxia were observed during the initial two years, but a subsequent decline in DO concentrations was witnessed recently. The periodicity of DO concentrations in the Pearl River Estuary reduced with the increase in the hypoxic intensity. Maximal information coefficient (MIC) and extreme gradient boosting (XGBoost) were employed to determine the significance of input variables, which were subsequently validated by using the long- and short-term memory networks (LSTMs). The driving factors contributing to the hypoxia problem were shown as temperature, pH, conductivity, and NH4+-N concentrations. Notably, the evaluation index values of the hybrid model are MAPE = 0.0887 and R2 = 0.9208, which have been improved compared with the LSTM model by about 99.34% in MAPE reduction and 16.56% in R2 improvement, indicating that the MixUp-LSTM model was capable of effectively capturing nonlinear relationships between DO and other water quality indicators. Based on existing literature, three traditional statistical methods and four machine learning models were also performed to compare with the proposed MixUp-LSTM model, which outperformed other models in terms of prediction accuracy and robustness. Overall, the successful identification of the driving factors for the deoxygenation phenomenon would have important implications for the governance and regulation of low DO in estuaries.