STREAMFLOW AND SOIL MOISTURE FORECASTING WITH HYBRID DATA INTELLIGENT MACHINE LEARNING APPROACHES: CASE STUDIES IN THE AUSTRALIAN MURRAY–DARLING BASIN
For a drought-prone agricultural nation such as Australia, hydro-meteorological imbalances and increasing demand for water resources are immensely constraining terrestrial water reservoirs and regional-scale agricultural productivity. Two important components of the terrestrial water reservoir i.e., streamflow water level (SWL) and soil moisture (SM), are imperative both for agricultural and hydrological applications. Forecasted SWL and SM can enable prudent and sustainable decisionmaking for agriculture and water resources management. To feasibly emulate SWL and SM, machine learning data-intelligent models are a promising tool in today’s rapidly advancing data science era. Yet, the naturally chaotic characteristics of hydro-meteorological variables that can exhibit non-linearity and non-stationarity behaviors within the model dataset, is a key challenge for non-tuned machine learning models. Another important issue that could confound model accuracy or applicability is the selection of relevant features to emulate SWL and SM since the use of too fewer inputs can lead to insufficient information to construct an accurate model while the use of an excessive number and redundant model inputs could obscure the performance of the simulation algorithm. This research thesis focusses on the development of hybridized dataintelligent models in forecasting SWL and SM in the upper layer (surface to 0.2 m) and the lower layer (0.2–1.5 m depth) within the agricultural region of the Murray-Darling Basin, Australia. The SWL quantifies the availability of surface water resources, while, the upper layer SM (or the surface SM) is important for surface runoff, evaporation, and energy exchange at the Earth-Atmospheric interface. The lower layer (or the root zone) SM is essential for groundwater recharge purposes, plant uptake and transpiration. This research study is constructed upon four primary objectives designed for the forecasting of SWL and SM with subsequent robust evaluations by means of statistical metrics, in tandem with the diagnostic plots of observed and modeled datasets. The first objective establishes the importance of feature selection (or optimization) in the forecasting of monthly SWL at three study sites within the Murray-Darling Basin. Artificial neural network (ANN) model optimized with iterative input selection (IIS) algorithm named IIS-ANN is developed whereby the IIS algorithm achieves feature optimization. The IIS-ANN model outperforms the standalone models and a further hybridization is performed by integrating a nondecimated and advanced maximum overlap discrete wavelet transformation (MODWT) technique. The IIS selected inputs are transformed into wavelet subseries via MODWT to unveil the embedded features leading to IIS-W-ANN model. The IIS-W-ANN outperforms the comparative IIS-W-M5 Model Tree, IIS-based and standalone models. In the second objective, improved self-adaptive multi-resolution analysis (MRA) techniques, ensemble empirical mode decomposition (EEMD) and complete ensemble empirical mode decomposition with adaptive noise (CEEMDAN) are utilized to address the non-stationarity issues in forecasting monthly upper and lower layer soil moisture at seven sites. The SM time-series are decomposed using EEMD/CEEMDAN into respective intrinsic mode functions (IMFs) and residual components. Then the partial-auto correlation function based significant lags are utilized as inputs to the extreme learning machine (ELM) and random forest (RF) models. The hybrid EEMD-ELM yielded better results in comparison to the CEEMDAN-ELM, EEMD-RF, CEEMDAN-RF and the classical ELM and RF models. Since SM is contingent upon many influential meteorological, hydrological and atmospheric parameters, for the third objective sixty predictor inputs are collated in forecasting upper and lower layer soil moisture at four sites. An ANN-based ensemble committee of models (ANN-CoM) is developed integrating a two-phase feature optimization via Neighborhood Component Analysis based feature selection algorithm for regression (fsrnca) and a basic ELM. The ANN-CoM shows better predictive performance in comparison to the standalone second order Volterra, M5 Model Tree, RF, and ELM models. In the fourth objective, a new multivariate sequential EEMD based modelling is developed. The establishment of multivariate sequential EEMD is an advancement of the classical single input EEMD approach, achieving a further methodological improvement. This multivariate approach is developed to allow for the utilization of multiple inputs in forecasting SM. The multivariate sequential EEMD optimized with cross-correlation function and Boruta feature selection algorithm is integrated with the ELM model in emulating weekly SM at four sites. The resulting hybrid multivariate sequential EEMD-Boruta-ELM attained a better performance in comparison with the multivariate adaptive regression splines (MARS) counterpart (EEMD-Boruta-MARS) and standalone ELM and MARS models. The research study ascertains the applicability of feature selection algorithms integrated with appropriate MRA for improved hydrological forecasting. Forecasting at shorter and near-real-time horizons (i.e., weekly) would help reinforce scientific tenets in designing knowledge-based systems for precision agriculture and climate change adaptation policy formulations.
- # Extreme Learning Machine Models
- # Streamflow Water Level
- # Upper Layer
- # Random Forest Models
- # Soil Moisture
- # Extreme Learning Machine
- # Ensemble Empirical Mode Decomposition
- # Complete Ensemble Empirical Mode Decomposition With Adaptive Noise
- # Maximum Overlap Discrete Wavelet Transformation
- # Availability Of Surface Water Resources
- Research Article
216
- 10.1016/j.geoderma.2018.05.035
- Jun 6, 2018
- Geoderma
Soil moisture forecasting by a hybrid machine learning technique: ELM integrated with ensemble empirical mode decomposition
- Research Article
93
- 10.1016/j.envres.2017.01.035
- Mar 10, 2017
- Environmental Research
Very short-term reactive forecasting of the solar ultraviolet index using an extreme learning machine integrated with the solar zenith angle
- Research Article
61
- 10.1002/wer.1642
- Oct 4, 2021
- Water Environment Research
Stream waters play a crucial role in catering to the world's needs with the required quality of water. Due to the discharges of wastewater from the various point and nonpoint sources, most of the watersheds are contaminated easily. The Upper Green River watershed in Kentucky, USA, is one such watershed that is contaminated over the years due to the runoff from rural areas and agricultural lands and combined sewer overflows (CSOs) from urban areas. Monitoring and characterizing the water quality status of streams in such watersheds has become of great importance, with multivariate statistical techniques such as regression, factor analysis, cluster analysis, and artificial intelligence methods such as artificial neural networks (ANNs). The water quality parameters, namely, fecal coliform (FC), turbidity, pH, and conductivity have been predicted quantitatively using ANNs to understand the water quality status of streams in the Upper Green River watershed elsewhere. In this study, a novel attempt has been made to predict the status of the quality of the Green River water with the predictive capabilities of a few decision tree (DT) algorithms such as classification and regression tree (CART) model, multivariate adaptive regression splines (MARS) model, random forest (RF) model, and extreme learning machine (ELM) model. The RF model's performance is better in predicting FC, turbidity, and pH than CART models in training and testing phases. Relatively, MARS and ELM models did better in testing though the performance is poorer in training. For example, we obtain the RMSE values of 2206, 2532, 1533, and 1969 using RF, CART, MARS, and ELM for FC in testing. A good correlation has been observed between conductivity and temperature, precipitation, and land-use factors for the MARS model. Overall, DT models are helpful in understanding, interpreting the outcomes, and visualizing the results compared with the other models. PRACTITIONER POINTS: The prediction of stream water quality parameters using decision trees is explored. The climate and land use parameters are used as input parameters to the modeling. The DT models of CART, MARS, RF, and ANNs such as ELM are explored to predict stream water quality. The RF model shows stable results compared with CART, MARS, and ELM for the data explored. Apart from the R2 value, RMSE and MAE indicate the effectiveness of DTs in prediction.
- Research Article
92
- 10.1016/j.engstruct.2019.05.048
- May 28, 2019
- Engineering Structures
Load-carrying capacity and mode failure simulation of beam-column joint connection: Application of self-tuning machine learning model
- Research Article
212
- 10.1016/j.compag.2018.10.014
- Oct 30, 2018
- Computers and Electronics in Agriculture
Artificial intelligence approach for the prediction of Robusta coffee yield using soil fertility properties
- Research Article
407
- 10.1016/j.advengsoft.2017.09.004
- Sep 21, 2017
- Advances in Engineering Software
Predicting compressive strength of lightweight foamed concrete using extreme learning machine model
- Research Article
100
- 10.1016/j.apenergy.2017.10.076
- Nov 4, 2017
- Applied Energy
An efficient neuro-evolutionary hybrid modelling mechanism for the estimation of daily global solar radiation in the Sunshine State of Australia
- Research Article
181
- 10.1016/j.geoderma.2018.11.044
- Nov 29, 2018
- Geoderma
Estimation of soil temperature from meteorological data using different machine learning models
- Research Article
231
- 10.1016/j.rser.2019.01.014
- Jan 22, 2019
- Renewable and Sustainable Energy Reviews
Significant wave height forecasting via an extreme learning machine model integrated with improved complete ensemble empirical mode decomposition
- Research Article
86
- 10.1016/j.compag.2018.07.008
- Jul 20, 2018
- Computers and Electronics in Agriculture
Survey of different data-intelligent modeling strategies for forecasting air temperature using geographic information as model predictors
- Addendum
1
- 10.1007/s10661-016-5186-6
- Mar 7, 2016
- Environmental Monitoring and Assessment
A predictive model for streamflow has practical implications for understanding the drought hydrology, environmental monitoring and agriculture, ecosystems and resource management. In this study, the state-or-art extreme learning machine (ELM) model was utilized to simulate the mean streamflow water level (Q WL) for three hydrological sites in eastern Queensland (Gowrie Creek, Albert, and Mary River). The performance of the ELM model was benchmarked with the artificial neural network (ANN) model. The ELM model was a fast computational method using single-layer feedforward neural networks and randomly determined hidden neurons that learns the historical patterns embedded in the input variables. A set of nine predictors with the month (to consider the seasonality of Q WL); rainfall; Southern Oscillation Index; Pacific Decadal Oscillation Index; ENSO Modoki Index; Indian Ocean Dipole Index; and Nino 3.0, Nino 3.4, and Nino 4.0 sea surface temperatures (SSTs) were utilized. A selection of variables was performed using cross correlation with Q WL, yielding the best inputs defined by (month; P; Nino 3.0 SST; Nino 4.0 SST; Southern Oscillation Index (SOI); ENSO Modoki Index (EMI)) for Gowrie Creek, (month; P; SOI; Pacific Decadal Oscillation (PDO); Indian Ocean Dipole (IOD); EMI) for Albert River, and by (month; P; Nino 3.4 SST; Nino 4.0 SST; SOI; EMI) for Mary River site. A three-layer neuronal structure trialed with activation equations defined by sigmoid, logarithmic, tangent sigmoid, sine, hardlim, triangular, and radial basis was utilized, resulting in optimum ELM model with hard-limit function and architecture 6-106-1 (Gowrie Creek), 6-74-1 (Albert River), and 6-146-1 (Mary River). The alternative ELM and ANN models with two inputs (month and rainfall) and the ELM model with all nine inputs were also developed. The performance was evaluated using the mean absolute error (MAE), coefficient of determination (r (2)), Willmott's Index (d), peak deviation (P dv), and Nash-Sutcliffe coefficient (E NS). The results verified that the ELM model as more accurate than the ANN model. Inputting the best input variables improved the performance of both models where optimum ELM yielded R(2) ≈ (0.964, 0.957, and 0.997), d ≈ (0.968, 0.982, and 0.986), and MAE ≈ (0.053, 0.023, and 0.079) for Gowrie Creek, Albert River, and Mary River, respectively, and optimum ANN model yielded smaller R(2) and d and larger simulation errors. When all inputs were utilized, simulations were consistently worse with R (2) (0.732, 0.859, and 0.932 (Gowrie Creek), d (0.802, 0.876, and 0.903 (Albert River), and MAE (0.144, 0.049, and 0.222 (Mary River) although they were relatively better than using the month and rainfall as inputs. Also, with the best input combinations, the frequency of simulation errors fell in the smallest error bracket. Therefore, it can be ascertained that the ELM model offered an efficient approach for the streamflow simulation and, therefore, can be explored for its practicality in hydrological modeling.
- Research Article
172
- 10.1007/s10661-016-5094-9
- Jan 16, 2016
- Environmental Monitoring and Assessment
A predictive model for streamflow has practical implications for understanding the drought hydrology, environmental monitoring and agriculture, ecosystems and resource management. In this study, the state-or-art extreme learning machine (ELM) model was utilized to simulate the mean streamflow water level (Q WL) for three hydrological sites in eastern Queensland (Gowrie Creek, Albert, and Mary River). The performance of the ELM model was benchmarked with the artificial neural network (ANN) model. The ELM model was a fast computational method using single-layer feedforward neural networks and randomly determined hidden neurons that learns the historical patterns embedded in the input variables. A set of nine predictors with the month (to consider the seasonality of Q WL); rainfall; Southern Oscillation Index; Pacific Decadal Oscillation Index; ENSO Modoki Index; Indian Ocean Dipole Index; and Nino 3.0, Nino 3.4, and Nino 4.0 sea surface temperatures (SSTs) were utilized. A selection of variables was performed using cross correlation with Q WL, yielding the best inputs defined by (month; P; Nino 3.0 SST; Nino 4.0 SST; Southern Oscillation Index (SOI); ENSO Modoki Index (EMI)) for Gowrie Creek, (month; P; SOI; Pacific Decadal Oscillation (PDO); Indian Ocean Dipole (IOD); EMI) for Albert River, and by (month; P; Nino 3.4 SST; Nino 4.0 SST; SOI; EMI) for Mary River site. A three-layer neuronal structure trialed with activation equations defined by sigmoid, logarithmic, tangent sigmoid, sine, hardlim, triangular, and radial basis was utilized, resulting in optimum ELM model with hard-limit function and architecture 6-106-1 (Gowrie Creek), 6-74-1 (Albert River), and 6-146-1 (Mary River). The alternative ELM and ANN models with two inputs (month and rainfall) and the ELM model with all nine inputs were also developed. The performance was evaluated using the mean absolute error (MAE), coefficient of determination (r (2)), Willmott's Index (d), peak deviation (P dv), and Nash-Sutcliffe coefficient (E NS). The results verified that the ELM model as more accurate than the ANN model. Inputting the best input variables improved the performance of both models where optimum ELM yielded R(2) ≈ (0.964, 0.957, and 0.997), d ≈ (0.968, 0.982, and 0.986), and MAE ≈ (0.053, 0.023, and 0.079) for Gowrie Creek, Albert River, and Mary River, respectively, and optimum ANN model yielded smaller R(2) and d and larger simulation errors. When all inputs were utilized, simulations were consistently worse with R (2) (0.732, 0.859, and 0.932 (Gowrie Creek), d (0.802, 0.876, and 0.903 (Albert River), and MAE (0.144, 0.049, and 0.222 (Mary River) although they were relatively better than using the month and rainfall as inputs. Also, with the best input combinations, the frequency of simulation errors fell in the smallest error bracket. Therefore, it can be ascertained that the ELM model offered an efficient approach for the streamflow simulation and, therefore, can be explored for its practicality in hydrological modeling.
- Research Article
69
- 10.1016/j.jhydrol.2023.129460
- Mar 31, 2023
- Journal of Hydrology
An enhanced monthly runoff time series prediction using extreme learning machine optimized by salp swarm algorithm based on time varying filtering based empirical mode decomposition
- Research Article
109
- 10.1007/s11356-017-9283-z
- May 30, 2017
- Environmental Science and Pollution Research
In this paper, several extreme learning machine (ELM) models, including standard extreme learning machine with sigmoid activation function (S-ELM), extreme learning machine with radial basis activation function (R-ELM), online sequential extreme learning machine (OS-ELM), and optimally pruned extreme learning machine (OP-ELM), are newly applied for predicting dissolved oxygen concentration with and without water quality variables as predictors. Firstly, using data from eight United States Geological Survey (USGS) stations located in different rivers basins, USA, the S-ELM, R-ELM, OS-ELM, and OP-ELM were compared against the measured dissolved oxygen (DO) using four water quality variables, water temperature, specific conductance, turbidity, and pH, as predictors. For each station, we used data measured at an hourly time step for a period of 4years. The dataset was divided into a training set (70%) and a validation set (30%). We selected several combinations of the water quality variables as inputs for each ELM model and six different scenarios were compared. Secondly, an attempt was made to predict DO concentration without water quality variables. To achieve this goal, we used the year numbers, 2008, 2009, etc., month numbers from (1) to (12), day numbers from (1) to (31) and hour numbers from (00:00) to (24:00) as predictors. Thirdly, the best ELM models were trained using validation dataset and tested with the training dataset. The performances of the four ELM models were evaluated using four statistical indices: the coefficient of correlation (R), the Nash-Sutcliffe efficiency (NSE), the root mean squared error (RMSE), and the mean absolute error (MAE). Results obtained from the eight stations indicated that: (i) the best results were obtained by the S-ELM, R-ELM, OS-ELM, and OP-ELM models having four water quality variables as predictors; (ii) out of eight stations, the OP-ELM performed better than the other three ELM models at seven stations while the R-ELM performed the best at one station. The OS-ELM models performed the worst and provided the lowest accuracy; (iii) for predicting DO without water quality variables, the R-ELM performed the best at seven stations followed by the S-ELM in the second place and the OP-ELM performed the worst with low accuracy; (iv) for the final application where training ELM models with validation dataset and testing with training dataset, the OP-ELM provided the best accuracy using water quality variables and the R-ELM performed the best at all eight stations without water quality variables. Fourthly, and finally, we compared the results obtained from different ELM models with those obtained using multiple linear regression (MLR) and multilayer perceptron neural network (MLPNN). Results obtained using MLPNN and MLR models reveal that: (i) using water quality variables as predictors, the MLR performed the worst and provided the lowest accuracy in all stations; (ii) MLPNN was ranked in the second place at two stations, in the third place at four stations, and finally, in the fourth place at two stations, (iii) for predicting DO without water quality variables, MLPNN is ranked in the second place at five stations, and ranked in the third, fourth, and fifth places in the remaining three stations, while MLR was ranked in the last place with very low accuracy at all stations. Overall, the results suggest that the ELM is more effective than the MLPNN and MLR for modelling DO concentration in river ecosystems.
- Research Article
72
- 10.1016/j.ijhydene.2017.04.084
- May 1, 2017
- International Journal of Hydrogen Energy
Comparison of artificial intelligence and empirical models for estimation of daily diffuse solar radiation in North China Plain