传统的水-气界面温室气体通量的监测方法具有诸多局限,对其影响因素的分析也大多基于数学统计层面。对此,本研究提供了一种较为新颖的研究和分析方法——基于机器学习的数据预测和分析。本研究采用2种经典机器学习算法——随机森林(RF)和支持向量机(SVM)和2种深度学习算法——卷积神经网络(CNN)和长短时记忆神经网络(LSTM),通过环境因素预测水库水-气界面CO<sub>2</sub>和CH<sub>4</sub>扩散通量。此外,采用RF中的特征重要性评估和经典算法决策树(DT),对环境因素和水库温室气体扩散通量的关系进行了全新角度的数据挖掘和分析。结果表明:深度学习算法的预测效果均较好,经典机器学习算法中RF预测效果显著优于SVM。LSTM和RF分别产生了最优的CO<sub>2</sub>扩散通量和CH<sub>4</sub>扩散通量的预测精度,均方根误差(RMSE)分别为0.424 mmol/(m<sup>2</sup>·h)和0.140 μmol/(m<sup>2</sup>·h),预测值与实测值的R<sup>2</sup>分别为0.960和0.758。RF的特征重要性评估表明沉积物因子和营养因子均为影响CO<sub>2</sub>和CH<sub>4</sub>扩散通量的关键因子,气候因子和水环境因子相较次之。采用决策树描绘决定CO<sub>2</sub>扩散通量源和汇的环境因子的极限阈值,决策树对所有样本的分类准确性高达100%,且其结果还表明低浓度的溶解无机碳和碱性条件有利于水体成为CO<sub>2</sub>汇。因此,使用机器学习算法预测和分析水库水-气界面温室气体通量的潜力巨大。;Traditional methods for monitoring greenhouse gas fluxes at the water-air interface in reservoirs have many limitations. The analysis on its influencing factors is also mainly based on mathematical statistics. This study provides an innovative approach by using machine learning algorithms. In this study, two traditional machine learning algorithms (Random forests (RF) and Support vector machine (SVM)) and two deep learning algorithms (Convolutional neural network (CNN) and long and short term memory neural network (LSTM)) were applied to predict CO<sub>2</sub> and CH<sub>4</sub> diffusion fluxes. In addition, the feature importance assessment in RF and the decision tree (DT) are used to analyze the relationship between environmental factors and GHG diffusion fluxes in reservoirs from a new perspective. The results showed that deep learning produced excellent prediction accuracy, whereas prediction accuracy of RF was significantly better than SVM in traditional machine learning. LSTM and RF yielded optimal accuracy in predicting CO<sub>2</sub> flux and CH<sub>4</sub> flux, respectively. The root mean square error (RMSE) was 0.424 mmol/(m<sup>2</sup>·h) and 0.140 μmol/(m<sup>2</sup>·h) and R<sup>2</sup> of the predicted and measured values were 0.960 and 0.758, respectively. RF identified sediment and nutrient as critical environmental factors to GHG fluxes, followed by climate factors and water environment factors. Lastly, a decision tree was used innovatively to depict the limiting threshold of environmental factors that determines the source or sink of CO<sub>2</sub>. The classification accuracy of this decision tree is as high as 100% in this study. The results of decision tree also showed that low dissolved inorganic carbon concentration and alkaline conditions are favorable for water to absorb atmospheric CO<sub>2</sub>. These results demonstrate the great potential of using machine learning algorithms to predict and analyze GHG fluxes at the water-air interface in reservoirs.
Read full abstract