Optimizing coastal groundwater quality predictions: A novel data mining framework with cross-validation, bootstrapping, and entropy analysis.
Optimizing coastal groundwater quality predictions: A novel data mining framework with cross-validation, bootstrapping, and entropy analysis.
- Research Article
14
- 10.1080/15275922.2019.1566292
- Jan 2, 2019
- Environmental Forensics
Redundancy analysis for characterizing the groundwater quality in coastal industrial areas
- Research Article
- 10.62554/d4ffgh98
- Nov 30, 2024
- ALTAIR: Jurnal Transportasi dan Bahari
Groundwater quality in coastal areas is responsive due to the interaction of fresh water with seawater. Groundwater quality monitoring studies in coastal area needs to be done regularly to prevent natural disasters and clean water crises in the future. In this study, measurements of groundwater physics parameter were carried out, i.e. electrical conductivity to analyze groundwater quality. The interpolation method based on Geographic Information Systems (GIS) was used to obtain values between each sample point to produce a spatial distribution map of groundwater quality over the research location. The results of the study showed that groundwater quality in the coastal area of Durung Village was generally still quite good with conductivity values between 689 - 3,391 µS/cm. However, for the locations which is nearby the coast such as in the location of Malahayati Merchant Marine Polytechnic, it was indicated that it had been contaminated by seawater with the highest conductivity values between 5,763 – 6.270 µS/cm.
- Research Article
23
- 10.1080/00405000.2017.1279004
- Jan 12, 2017
- The Journal of The Textile Institute
The aim of this paper was to predict the colour strength of viscose knitted fabrics by using fuzzy logic (FL) model based on dye concentration, salt concentration and alkali concentration as input variables. Moreover, the performance of fuzzy logic (FL) model is compared with that of artificial neural network (ANN) model. In addition, same parameters and data have been used in ANN model. From the experimental study, it was found that dye concentration has the main and greatest effects on the colour strength of viscose knitted fabrics. The coefficient of determination (R2), root mean square (RMS) and mean absolute errors (MAE) between the experimental colour strength and that predicted by FL model are found to be 0.977, 1.025 and 4.61%, respectively. Further, the coefficient of determination (R2), root mean square (RMS) and mean absolute errors (MAE) between the experimental colour strength and that predicted by ANN model are found to be 0.992, 0.726 and 3.28%, respectively. It was found that both ANN and FL models have ability and accuracy to predict the fabric colour strength effectively in non-linear domain. However, ANN prediction model shows higher prediction accuracy than that of Fuzzy model.
- Research Article
39
- 10.4236/gep.2017.53008
- Jan 1, 2017
- Journal of Geoscience and Environment Protection
With respect to groundwater deterioration from human activities a unique situation of co-disposal of non-engineered Municipal Solid Waste (MSW) dumping and Secondary Wastewater (SWW) disposal on land prevails simultaneously within the same campus at Puducherry in India. Broadly the objective of the study is to apply and compare Artificial Neural Network (ANN) and Multi Linear Regression (MLR) models on groundwater quality applying Canadian Water Quality Index (CWQI). Totally, 1065 water samples from 68 bore wells were collected for two years on monthly basis and tested for 17 physio-chemical and bacteriological parameters. However the study was restricted to the pollution aspects of 10 physio-chemical parameters such as EC, TDS, TH, , Cl-, , Na+, Ca2+, Mg2+ and K+. As there is wide spatial variation (2 to 3 km radius) with ground elevation (more than 45 m) among the bore wells it is appropriate to study the groundwater quality using Multivariate Statistical Analysis and ANN. The selected ten parameters were subjected to Hierarchical Cluster Analysis (HCA) and the clustering procedure generated three well defined clusters. Cluster wise important physio-chemical attributes which were altered by MSW and SWW operations, are statistically assessed. The CWQI was evolved with the objective to deliver a mechanism for interpreting the water quality data for all three clusters. The ANOVA test results viz., F-statistic (F = 134.55) and p-value (p = 0.000 < 0.05) showed that there are significant changes in the average values of CWQI among the three clusters, thereby confirming the formation of clusters due to anthropogenic activities. The CWQI simulation was performed using MLR and ANN models for all three clusters. Totally, 1 MLR and 9 ANN models were considered for simulation. Further the performances of ten models were compared using R2, RMSE and MAE (quantitative indicators). The analyses of the results revealed that both MLR and ANN models were fairly good in predicting the CWQI in Clusters 1 and 2 with high R2, low RMSE and MAE values but in Cluster 3 only ANN model fared well. Thus this study will be very useful to decision makers in solving water quality problems.
- Research Article
26
- 10.1080/10298436.2020.1766688
- May 22, 2020
- International Journal of Pavement Engineering
This study presents the analytical models to predict the compressive strength of roller-compacted concrete pavement (RCCP) containing steel slag aggregate and fly ash. Based on the experimental results, three models were established in this study including multiple regression analysis (MRA), artificial neural networks (ANN) and fuzzy logic (FL). In the RCCP mixtures, cement was partially substituted by fly ash at four levels: 10%, 20%, 30%, and 40%; natural coarse aggregate was replaced by steel slag aggregate at ratio of 50% and 100%. The compressive strength was determined at 3-, 7-, 28-, 56- and 91-day ages. 75 sets of testing data were collected to build the target values set. With same seven input variables, the MRA model is less reliable than the ANN model in terms of predicting the compressive strength of RCCP. Besides, the use of triangular membership functions with three input variables (fly ash content, steel slag aggregate content and age) in the FL algorithm is sufficient to obtain accurate results. The performance of the FL model is as good as the ANN model. Additionally, a total of 33 fuzzy rules found for building the FL model can be applied to predict the compressive strength of RCCP. Highlights MRA, ANN, and FL were used to construct the models for predicting the compressive strength of RCCP containing steel slag aggregate and fly ash. The ANN model and FL model created reliable results in predicting the strength of RCCP. The MRA model is less reliable than the ANN and FL models in terms of predicting of RCCP compressive strength. The best model is the FL model because of its friendly and efficiency.
- Research Article
- 10.2139/ssrn.5942756
- Jan 1, 2026
- SSRN Electronic Journal
Introduction River flow forecasting has been one of the important challenges of water resources management in recent decades, so many researchers have proposed different methods to improve the performance of forecasting models. In the last decade, artificial intelligence methods have been most widely used in the simulation of various processes, including hydrological processes, due to their flexibility and high accuracy in modeling. However, the results of these methods have always been associated with uncertainty due to several factors such as structure, algorithm, input data, and the type of method chosen for data calibration. One of the methods that can somewhat solve this problem is the uncertainty analysis of the predictions made by these models. Materials and Methods In this study, the uncertainty of the results of artificial neural network (ANN) and support vector machine (SVM) models in predicting the monthly flow of the river has been evaluated. In this research, the time series of the monthly flow of the Ghezelozan River using the data of the Bianlu-Yasaul Stream gauging station in 39 years from 1976 to 2014 was used, where 75% and 25% of the data was used for training and testing the models, respectively. In these models, to estimate the monthly flow of the Ghezelozan River, six different input combinations including the flow of one, two, and three months before and the number of months of the flow were used. Then, the accuracy and performance of the models were compared using the coefficient of determination (R) and root mean square of errors (RMSE). Finally, the uncertainty of the results of ANN and SVM models in predicting the monthly flow of the river was evaluated by the Monte-Carlo method using dfactor and 95PPU values. Results and Discussion The evaluation of the performance of the ANN model shows that the best performance is related to the combination where the flow of the previous two months and the number of the month of the flow are the inputs of the model so that R and RMSE indexes were obtained as 0.757 and 9.45, respectively. In the SVM model for the monthly river flow series, the best performance is related to the combination where the flow of one, two, and three months ago and the number of months of the flow were the inputs of the model, and the R and RMSE indexes for this input pattern were 0.729 and 10.946, respectively. After studying the uncertainty of the models, the results showed that the ANN model has more uncertainty in the output values compared to the SVM model, and this is while the d-factor of the ANN model, both in the training and test phase, it was more than the SVM model. The comparison of the uncertainty analysis of the results of the ANN and SVM models showed that the SVM model with d-factor and 95PPU values equal to 0.155 and 17.241, respectively, compared to the ANN model with d-factor and 95PPU values equal to 0.993 and 85.470, respectively, has less uncertainty in the output values. So the number of observation data placed in the 95% confidence range (95PPU) of the ANN model compared to the SVM model has increased significantly in both the training and testing phases. Also, the results showed that both models have more uncertainty in the months with a large volume of water, which can be due to the complexity of the process and the involvement of uncertain factors in these months, as well as the effect of factors that are not considered in the structure of predictive models. Conclusion The results of ANN and SVM models in predicting the monthly flow of the Ghezelozan River showed that although the ANN model with R-value equal to 0.757 and RMSE value equal to 9.45 has a good performance compared to the SVM model with R-value equal to 0.729 and RMSE value equal to 10.946 in predicting the river flow, the results of this model are associated with high uncertainty. The comparison of the uncertainty analysis of the results of ANN and SVM models by Monte-Carlo method showed that the SVM model with dfactor and 95PPU values equal to 0.155 and 17.241, respectively, compared to the ANN model with d-factor and 95PPU values equal to 0.993 and 85.470, respectively, has less uncertainty in predicting the monthly flow of the Ghezelozan River and it is better than ANN model. According to the results of this research, taking into account the fact that advanced artificial intelligence models also have uncertainty, it is necessary to apply these methods in the fields of risk management and future planning to obtain the best performance.
- Conference Article
- 10.1117/12.2325519
- Oct 11, 2018
Frequency and intensity of the harmful algal blooms (HABs) increased globally since 1970s. The increase in HABs have negatively affected aquatic ecosystem and aquaculture industry. The economic losses were about $ 1 billion in Europe, $ 100 million in USA and $ 121 billion in Korea per year. There were various field monitoring campaigns for ecological and biological researches. However, traditional HABs monitoring has limitations on both spatial and temporal coverage. In these days, multispectral remote sensing methods using satellite sensors have been widely used to monitor HABs in ocean and coastal areas. However, the satellite systems used in ocean and coastal research, such as MODIS, SeaWiFS and etc. have limitations in study on complex coastline, because of their coarse spatial resolution (~ few km). In this research, we conducted two-year intensive monitoring on the South Sea of Korea from 2016 to 2017 at 62 sampling station and used landsat-8 operational land imager (OLI) satellite that has 30m spatial resolution. We used 4 band (band 1 to 4), 4-band ratio (band 1 over band 3 and 4, and band 2 over band 3 and 4) and mixed dataset of 4 band and 4-band ratio. The empirical OC algorithms showed poor performances, under 0.25 of r-squared. The machine learning techniques, i.e., artificial neural network (ANN) and support vector machine (SVM) were applied to enhance performance of estimating chl-a on landsat-8 application. Parameters for developing ANN and SVM model were optimized using a pattern search algorithm in MATLAB toolbox. All dataset were divided into 80 % of training and 20 % of validation data. In the training step, mixed dataset showed the best performance in both ANN and SVM models, whereas 4-band ratio and 4 band dataset in the validation step showed the best performance in ANN and SVM, respectively. The ANN model showed poor performance in low chl-a concentrations but SVM had more accurate performance in low and mid concentrations. Both models under-estimated chl-a in mid to high concentration range. For the mapping results, the ANN model using 4 band dataset showed very low concentration of chl-a in most of research area, whereas SVM showed high concentration of chl-a in coastal area and bay. The result using 4-band ratio dataset showed similar chl-a distribution in ANN and SVM. For mixed dataset results the ANN model estimated over 8 mg m-3 of chl-a at some of coastal, almost zero in near coastal area and over 2 mg m-3 chl-a concentration for off-shore area. In case of SVM, all region showed approximately 2 mg m-3 of chl-a concentration. Landsat-8 OLI was not proper system for OC algorithms. Machine learning techniques were effective tools for enhancing ocean chl-a estimation performance using landsat-8 OLI. Thus, this study showed potential of landsat-8 OLI application to coastal HAB monitoring.
- Research Article
1
- 10.17977/um017v29i12024p28-42
- Apr 19, 2024
- Jurnal Pendidikan Geografi: Kajian, Teori, dan Praktek dalam Bidang Pendidikan dan Ilmu Geografi
Groundwater in coastal areas is one of the natural resources that is vulnerable to quality degradation due to population activities in coastal areas. This is also the case in Parangtritis Village, a coastal area with various potential regions for the population's welfare, ranging from tourism to agriculture, animal husbandry, and fisheries. Therefore, this study explores groundwater quality in Parangtritis Village, Kretek District, Bantul Regency, Yogyakarta. Groundwater quality data was collected through field surveys based on land use, with water quality parameters including odor, color, nitrate, nitrite, and E. coli. The Minister of Health Regulation Document Number 32 of 2017 was adopted as a benchmark for groundwater quality in the research area. Further, by using the gathered data, the groundwater quality was classified based on limiting parameters. Groundwater quality is distributed based on limiting parameters such as odor, color, nitrite, and E. coli bacteria. Odor and color limitations are found in agricultural areas, tourism areas, and fish farms. Nitrite limitations are found in residential and livestock areas. E. coli bacteria limitations are found in all land use areas.
- Research Article
33
- 10.1080/23744731.2018.1510270
- Sep 26, 2018
- Science and Technology for the Built Environment
This article compares two modeling approaches for optimal operation of a turbo chiller installed in an office building: (1) a machine learning model developed with artificial neural network (ANN) and (2) a hybrid machine learning model developed with the ANN model and available physical knowledge of the chiller. Before developing the ANN model of the chiller, the authors used Gaussian mixture model in order to check the validity of measured data. Then, the hybrid model was developed by combining the ANN model and physics-based regression equations from the EnergyPlus engineering reference. It was found that both the ANN and hybrid ANN model are satisfactory to predict the chiller’s power consumption: mean bias error (MBE) = −2.63%, coefficient of variation of the root mean square error (CVRMSE) = 8.05% by the ANN model; MBE = −3.99%, CVRMSE = 11.98% by the hybrid ANN model. However, the hybrid model requires fewer inputs (four inputs) than the ANN model (eight inputs). The energy savings of both models are similar coefficient of performance (COP) = 4.32 by the optimal operation of the ANN model; COP = 4.44 by the optimal operation of the hybrid ANN model. In addition, the hybrid ANN model can be applied where the ANN model is unable to provide accurate predictions.
- Research Article
96
- 10.1080/15435075.2011.602156
- Nov 1, 2011
- International Journal of Green Energy
This paper presents models for global and diffuse solar energy on a horizontal surface for main five sites in Malaysia. The global solar energy is modeled using linear, nonlinear, fuzzy logic, and artificial neural network (ANN) models, while the diffuse solar energy is modeled using linear, nonlinear, and ANN models. Three statistical values are used to evaluate the developed solar energy models, namely, the mean absolute percentage error, MAPE; root mean square error, RMSE; and mean bias error, MBE. The results showed that the ANN models are superior compared with the other models in which the MAPE in calculating the global solar energy in Malaysia by the ANN model is 5.38%, while the MAPE for the linear, nonlinear, and fuzzy logic models are 8.13%, 6.93%, and 6.71%, respectively. The results for the diffuse solar energy showed that the MAPE of the ANN model is 1.53%, while the MAPE of the linear and nonlinear models are 4.35% and 3.74%, respectively. The accurate ANN models can therefore be used to predict solar energy in Malaysia and nearby regions.
- Research Article
137
- 10.1061/(asce)0733-9429(2006)132:12(1321)
- Dec 1, 2006
- Journal of Hydraulic Engineering
This study presents the development of artificial neural network (ANN) and fuzzy logic (FL) models for predicting event-based rainfall runoff and tests these models against the kinematic wave approximation (KWA). A three-layer feed-forward ANN was developed using the sigmoid function and the backpropagation algorithm. The FL model was developed employing the triangular fuzzy membership functions for the input and output variables. The fuzzy rules were inferred from the measured data. The measured event based rainfall-runoff peak discharge data from laboratory flume and experimental plots were satisfactorily predicted by the ANN, FL, and KWA models. Similarly, all the three models satisfactorily simulated event-based rainfall-runoff hydrographs from experimental plots with comparable error measures. ANN and FL models also satisfactorily simulated a measured hydrograph from a small watershed 8.44 km2 in area. The results provide insights into the adequacy of ANN and FL methods as well as their competitiveness against the KWA for simulating event-based rainfall-runoff processes.
- Research Article
17
- 10.1177/155892501400900406
- Dec 1, 2014
- Journal of Engineered Fibers and Fabrics
The aim of this paper was to predict the needle penetration force in denim fabrics based on sewing parameters by using the fuzzy logic (FL) model. Moreover, the performance of fuzzy logic model is compared with that of the artificial neural network (ANN) model. The needle penetration force was measured on the Instron tensile tester. In order to plan the fuzzy logic model, the sewing needle size, number of fabric layers and fabric weight were taken into account as input parameters. The output parameter is needle penetration force. In addition, the same parameters and data are used in artificial neural network model. The results indicate that the needle penetration force can be predicted in terms of sewing parameters by using the fuzzy logic model. The difference between performance of fuzzy logic and neural network models is not meaningful ( RFL=0.971 and RANN=0.982). It is concluded that soft computing models such as fuzzy logic and artificial neural network can be utilized to forecast the needle penetration force in denim fabrics. Using the fuzzy logic model for predicting the needle penetration force in denim fabrics can help the garment manufacturer to acquire better knowledge about the sewing process. As a result, the sewing process may be improved, and also the quality of denim apparel increased.
- Research Article
25
- 10.1016/j.heliyon.2024.e33082
- Jun 19, 2024
- Heliyon
Monitoring of groundwater (GW) resources in coastal areas is vital for human needs, agriculture, ecosystems, securing water supply, biodiversity, and environmental sustainability. Although the utilization of water quality index (WQI) models has proven effective in monitoring GW resources, it has faced substantial criticism due to its inconsistent outcomes, prompting the need for more reliable assessment methods. Therefore, this study addressed this concern by employing the data-driven root mean squared (RMS) models to evaluate groundwater quality (GWQ) in the coastal Bhola district near the Bay of Bengal, Bangladesh. To enhance the reliability of the RMS-WQI model, the research incorporated the extreme gradient boosting (XGBoost) machine learning (ML) algorithm. For the assessment of GWQ, the study utilized eleven crucial indicators, including turbidity (TURB), electric conductivity (EC), pH, total dissolved solids (TDS), nitrate (NO3−), ammonium (NH4+), sodium (Na), potassium (K), magnesium (Mg), calcium (Ca), and iron (Fe). In terms of the GW indicators, concentration of K, Ca and Mg exceeded the guideline limit in the collected GW samples. The computed RMS-WQI scores ranged from 54.3 to 72.1, with an average of 65.2, categorizing all sampling sites' GWQ as “fair.” In terms of model reliability, XGBoost demonstrated exceptional sensitivity (R2 = 0.97) in predicting GWQ accurately. Furthermore, the RMS-WQI model exhibited minimal uncertainty (<1 %) in predicting WQI scores. These findings implied the efficacy of the RMS-WQI model in accurately assessing GWQ in coastal areas, that would ultimately assist regional environmental managers and strategic planners for effective monitoring and sustainable management of coastal GW resources.
- Research Article
45
- 10.3390/en13071663
- Apr 2, 2020
- Energies
This study presents the analysis and estimation of the hydrogen production from coffee mucilage mixed with organic wastes by dark anaerobic fermentation in a co-digestion system using an artificial neural network and fuzzy logic model. Different ratios of organic wastes (vegetal and fruit garbage) were added and combined with coffee mucilage, which led to an increase of the total hydrogen yield by providing proper sources of carbon, nitrogen, mineral, and other nutrients. A two-level factorial experiment was designed and conducted with independent variables of mucilage/organic wastes ratio, chemical oxygen demand (COD), acidification time, pH, and temperature in a 20-L bioreactor in order to demonstrate the predictive capability of two analytical modeling approaches. An artificial neural network configuration of three layers with 5-10-1 neurons was developed. The trapezoidal fuzzy functions and an inference system in the IF-THEN format were applied for the fuzzy logic model. The quality fit between experimental hydrogen productions and analytical predictions exhibited a predictive performance on the accumulative hydrogen yield with the correlation coefficient (R2) for the artificial neural network (> 0.7866) and fuzzy logic model (> 0.8485), respectively. Further tests of anaerobic dark fermentation with predefined factors at given experimental conditions showed that fuzzy logic model predictions had a higher quality of fit (R2 > 0.9508) than those from the artificial neural network model (R2 > 0.8369). The findings of this study confirm that coffee mucilage is a potential resource as the renewable energy carrier, and the fuzzy-logic-based model is able to predict hydrogen production with a satisfactory correlation coefficient, which is more sensitive than the predictive capacity of the artificial neural network model.
- Research Article
50
- 10.1007/s13201-022-01810-4
- Nov 21, 2022
- Applied Water Science
Groundwater is one of the most important natural resources in the world and is widely used for irrigation purposes. Groundwater quality is affected by various natural heterogeneities and anthropogenic activities. Consequently, monitoring groundwater quality and assessing its suitability are crucial for sustainable agricultural irrigation. In this study, the suitability of groundwater for irrigation was determined by using sodium adsorption ratio (SAR), residual sodium carbonate (RSC), Kelly index (KI), percentage of sodium (Na%), magnesium ratio (MR), potential salinity (PS) and permeability index (PI). The groundwater samples were collected and analyzed from 37 different sampling stations for this purpose. Along with suitability analysis, artificial neural network (ANN) and adaptive neuro-fuzzy inference system (ANFIS) models were used to predict irrigation water quality parameters. The models were evaluated by comparing the measured values and the predicted values using the statistical criteria [coefficient of determination (R2), mean absolute error (MAE), root mean square error (RMSE) and Nash–Sutcliffe efficiency (NS)]. In the estimation of all irrigation water quality parameters, the ANN model has performed much higher compared with the ANFIS model. Spatial distribution maps were generated for measured and ANN model-estimated irrigation water quality indices using the IDW interpolation method. Spatial distributions of groundwater quality indices revealed that MR was higher than the allowable limits in most of the study areas and the other quality criteria were within the permissible limits. It has been determined that the interpolation maps obtained as a result of artificial intelligence methods have appropriate sensitivity when compared with the observed maps. Based on the present findings, ANN models could be used as an efficient tool for estimating groundwater quality indices in unsampled sections of the study area and the other regions with similar conditions.