Kernel principal component analysis-based water quality index modelling for coastal aquifers in Saudi Arabia

  • Abstract
  • Literature Map
  • Similar Papers
Abstract
Translate article icon Translate Article Star icon
Take notes icon Take Notes

This study developed a novel Water Quality Index (WQI) using Kernel Principal Component Analysis (PCA) to assess groundwater quality (GWQ) in the coastal aquifers of Al-Qatif, Saudi Arabia. A total of 39 groundwater samples were collected from shallow and deep wells and analyzed for key physicochemical parameters. Six kernel types were tested, and the polynomial kernel was found to be most effective in preserving variance and reducing dimensionality. The Kernel PCA-based WQI classified wells into ‘Very Bad,’ ‘Bad,’ and ‘Medium’ categories, with scores such as W3 (WQI = 25.51, “Very Bad”), W31 (WQI = 46.7, “Bad”), and W38 (WQI = 56.75, “Medium”). Salinity and EC presented poor Sub-Index (SI) scores, reflecting the impact of seawater intrusion and over-extraction, while pH consistently showed high SI values (100), indicating natural buffering. By integrating non-linear dimensionality reduction, the proposed framework enhances traditional WQIs and facilitates more targeted and transparent groundwater decision-making. This includes identifying priority wells for remediation and supporting sustainable abstraction policies. The findings offer insight into sustainable water management in arid and semi-arid regions that are confronting groundwater degradation.

Similar Papers
  • Research Article
  • 10.11648/j.ajep.20241305.14
Spatial Distribution Analysis of Groundwater Quality Parameters in the East Region of Burkina Faso Using GIS Techniques
  • Oct 31, 2024
  • American Journal of Environmental Protection
  • Issoufou Ouedraogo + 3 more

Groundwater quality assessment is critical for achieving Sustainable Development Goal 6 (SDG-6), which aims to ensure the availability and sustainable management of water and sanitation for all. In Burkina Faso, groundwater is a vital natural resource supporting socio-economic development, particularly in arid and semi-arid regions where water scarcity and quality are significant challenges. Climatic conditions in the country made of a long, hot and dry season followed by a short rainy period, result in considerable variability in water availability. Rapid population growth exacerbates these challenges by increasing water demand in both urban and rural areas; therefore, putting additional pressure on the already limited water resources. Moreover, the expansion of mining and agricultural activities further stresses these resources with contaminations from use of hazardous substances and over-extraction. The use of fertilizers and pesticides contributes to pollution, posing serious risks to human health and local ecosystems. Given the strategic importance of groundwater for Burkina Faso development amidst these growing challenges, a comprehensive understanding of groundwater quality is essential. This study focuses on the Eastern Region of Burkina Faso and aims to analyze the spatial distribution of physicochemical parameters related to groundwater quality in order to support sustainable water resource management and public health initiatives. Water samples from 42 sites were collected and analyzed for parameters such as pH, electrical conductivity (EC), total dissolved solids (TDS), and concentrations of calcium, magnesium, sodium, potassium, chloride, sulfate, bicarbonate, and nitrate. The data were processed using the Inverse Distance Weighted (IDW) interpolation method in ArcGIS 10.8 to produce spatial maps of these parameters. A Water Quality Index (WQI) was calculated to classify groundwater quality as "Excellent" (WQI < 50), "Good" (50 ≤ WQI ≤ 100), or "Poor" (WQI > 100). The results revealed significant spatial variability in groundwater quality with concentrations sometimes exceeding WHO-standards. Specifically, 38.10% of the analyzed samples exceeded the standard for nitrates while 28.57% of the samples show turbidity above recommended thresholds. TDS levels vary considerably, reaching maximum values of 1,336 mg/L and electrical conductivity values reached 1,336 µS/cm. These results demonstrate marked heterogeneity in water quality parameters across the region. The generated maps could serve as valuable tool for decision-makers to enable identification of areas requiring particular attention for groundwater quality management.

  • Dissertation
  • 10.24355/dbbs.084-201101060930-0
Water quality modeling of large reservoirs in semi-arid regions under climate change – Example Lake Nasser (Egypt)
  • Dec 10, 2010
  • Mohamed Elshemy

In this work, a hydrodynamic and water quality model was developed for Lake Nubia based on a two-dimensional, laterally averaged and finite difference hydrodynamic and water quality code, CE-QUAL-W2. The model was calibrated and verified using data which were measured in the years of 2006 and 2007 during low flood periods, respectively. Measurements during the flood season are not available. The results of the presented model show a good agreement with the observed hydrodynamic and water quality records. Two water quality indices (WQIs), NSF WQI and CCME WQI, have been developed to assess the state of water quality in the investigated case study, Lake Nubia, during the first low flood period of January 2006. The CCME WQI has been modified to use the Egyptian standards (objectives) of raw water. Moreover, another two trophic status indices, Carlson TSI and LAWA TI, have been developed to evaluate the trophic status of Lake Nubia during the same period of January 2006. Results of the previously developed hydrodynamic and water quality model for Lake Nubia were used to validate the model. According to the developed water quality indices results, Lake Nubia has a good water quality state during the low flood period. The modified CCME WQI (based on measured data) indicates that the Lake Nubia water quality state is excellent according to the Egyptian standards of water quality for surface waterways. Results of the applied trophic status indices show that the Lake Nubia trophic status is eutrophic during the studied period. The effect of the global climate change on the hydrodynamic and water quality characteristics of Lake Nubia was conducted for the 21st century. To do that, the outputs of eleven global climate models for two global emissions scenarios combined with hydrological modeling were used. A theoretical process algorithm has been simplified, further developed and calibrated to modify the initial conditions of dissolved oxygen due to the global climate change effects. A sensitivity analysis has been conducted by using each of the predicted air temperature and inflow data separately in the model in order to investigate its effect on the characteristics of the hydrodynamic and water quality. Three hydrodynamic characteristics of the reservoir were investigated with respect to the climate change: water surface levels, evaporation water losses and thermal structure. In addition, eight water quality characteristics of the reservoir were investigated with respect to the climate change: dissolved oxygen, chlorophyll-a, ortho-phosphate, nitrate-nitrite, ammonium, total dissolved solids, total suspended solids and potential of hydrogen (pH). Moreover, the climate change effects on the water quality and trophic status indices have been studied. The results of the climate change study show partially significant impacts on the examined hydrodynamic and water quality characteristics, while the water quality and trophic status indices are slightly affected by the climate change scenarios.

  • Research Article
  • 10.31357/fesympo.v24i0.4344.g3449
Application of Water Quality Index to Monitor Ground Water Quality: A Case Study in Colombo Catchment of Sri Lanka
  • Nov 28, 2019
  • I Dharmasoma + 2 more

Deterioration of groundwater quality directly threatens the livability of a community. Sri Lanka is currently undergoing a rapid increase in the demand for water, particularly for urban/rural water supplies, irrigated agriculture and in the industrial sector, exerting a considerable pressure on the available groundwater resources. This study was carried out to assess the status of groundwater quality around the Parliament Lake, in Colombo catchment, Sri Lanka by employing the Canadian Council of Ministers of the Environment (CCME) Water Quality Index (WQI) from September 2016 to September 2017 (one year). The objective of the study was to assess the suitability of groundwater in the study area as potable water through CCME WQI. Water samples were collected from thirty-four (34) locations including twenty-six (26) domestic shallow wells and eight (08) deep wells. The in-situ measurements of the parameters pH, Temperature, Dissolved Oxygen, Total Dissolved Solids, Electrical Conductivity, Salinity were conducted monthly while the laboratory testing for Ammonia, Nitrate, Phosphate, Chemical Oxygen Demand, Biological Oxygen Demand were conducted twice for fifteen (15) selected wells during the project period. CCME WQI was calculated taking pH, Temperature, Dissolved Oxygen, Total Dissolved Solids, and Electrical Conductivity into account. Results revealed Nitrate, Sulphate and Calcium levels of both shallow and deep wells were within the Maximum Permissible Levels in the SLS 614, 1,983 drinking water standards. The Nitrate levels of both shallow and deep wells were comparatively high in the dry season and in contrast, Phosphate, Calcium, Sulphate and BOD values in most the shallow wells and deep wells were comparatively high in the wet season. High Ammonia levels of five (05) out of fifteen (15) selected shallow wells exceeded the maximum permissible level given in standards. The highest COD levels in dry and wet seasons were recorded 42.0 mg/l and 88.0 mg/l respectively indicating that the water is unsuitable for drinking. According to the CCME WQI, the quality of twenty three (23) out of twenty seven (27) shallow wells were in the “Marginal” level (85.19%) and one in “Poor” condition (3.70%). The water quality of these twenty four (24) shallow wells is frequently endangered or deteriorated. The CCME-WQI values indicated that the water in four Deep Wells is in good quality (57.14%), whereas water in two deep wells is in Marginal level quality. Present study revealed that GW 20, GW 08, GW 09, GW 10 and GW 28 have deteriorating water quality with downgrading parameters of Electrical Conductivity, Salinity, and Total Dissolved Solids. Accordingly, it is proposed to carry out a well-planned groundwater quality management mechanism to avoid further pollution. In addition, detail studies to identify the causes of ground water pollution should be conducted. Keywords: Ground water quality, Pollution, Colombo catchment, Water quality index

  • PDF Download Icon
  • Research Article
  • Cite Count Icon 27
  • 10.3390/w11081702
Identification of the Hydrogeochemical Processes and Assessment of Groundwater Quality, Using Multivariate Statistical Approaches and Water Quality Index in a Wastewater Irrigated Region
  • Aug 16, 2019
  • Water
  • Ana Elizabeth Marín Celestino + 5 more

Groundwater quality and availability are essential for human consumption and social and economic activities in arid and semiarid regions. Many developing countries use wastewater for irrigation, which has in most cases led to groundwater pollution. The Mezquital Valley, a semiarid region in central Mexico, is the largest agricultural irrigation region in the world, and it has relied on wastewater from Mexico City for over 100 years. Limited research has been conducted on the impact of irrigation practices on groundwater quality on the Mezquital Valley. In this study, 31 drinking water wells were sampled. Groundwater quality was determined using the water quality index (WQI) for drinking purposes. The hydrogeochemical process and the spatial variability of groundwater quality were analyzed using principal component analysis (PCA) and K-means clustering multivariate geostatistical tools. This study highlights the value of combining various approaches, such as multivariate geostatistical methods and WQI, for the identification of hydrogeochemical processes in the evolution of groundwater in a wastewater irrigated region. The PCA results revealed that salinization and pollution (wastewater irrigation and fertilizers) followed by geogenic sources (dissolution of carbonates) have a significant effect on groundwater quality. Groundwater quality evolution was grouped into cluster 1 and cluster 2, which were classified as unsuitable (low quality) and suitable (acceptable quality) for drinking purposes, respectively. Cluster 1 is located in wastewater irrigated zones, urban areas, and the surroundings of the Tula River. Cluster 2 locations are found in recharge zones, rural settlements, and seasonal agricultural fields. The results of this study strongly suggest that water management strategies that include a groundwater monitoring plan, as well as research-based wastewater irrigation regulations, in the Mezquital Valley are warranted.

  • Research Article
  • 10.1007/s11356-025-36477-2
Combining clustering and ensemble learning for groundwater quality monitoring: a data-driven framework for sustainable water management.
  • May 14, 2025
  • Environmental science and pollution research international
  • Harjot Kaur + 3 more

This groundwater quality assessment study for the state of Punjab, India, utilized six Water Quality Index (WQI) models, i.e., NSF-AM, NSF-GM, CCME, Horton, West Java, and GPI for potability assessment via machine learning (ML) classifiers. The results of the study manifested poor groundwater quality in many regions of the state that fall below potability standards (WHO and BIS). CCME WQI classified the state's groundwater as poor to marginal, rendering it unsuitable for human consumption. The disparities observed among WQI models highlighted differences in parameter selection, weight assignment, and aggregation techniques, emphasizing the need for a customized WQI framework for the Indian subcontinent for more accurate and robust groundwater quality assessment. K-means clustering, employed as a preprocessing step for improving classification accuracy, grouped data into two distinct clusters (validated by silhouette scores = 0.927 and Calinski-Harabasz index = 129.21), revealing contamination sources' patterns, feature refinement, and enhancement. Further, application and performance analysis of ML classifiers integrated with K-means clustering analysis identified Ensemble Hard Voting (EHV) and Ensemble Soft Voting (ESV) as top performers for groundwater quality classification. The GPI WQI combined with ESV achieved Accuracy = 99.13%, Precision = 100%, Recall = 99.03%, F1-score = 99.51%, Specificity = 100%, MCC = 0.95, Log Loss = 0.11, and AUC = 100% while maintaining moderate model and computational complexity (tpredict = 0.0095 s), underscores the efficiency and suitability of GPI and ESV blend for real-time water quality monitoring systems. The presented data-driven holistic framework highlights the capability of ML-driven groundwater assessment as a decision-support tool for resource-constrained regions, facilitating policy interventions and promoting sustainable water management practices by leveraging its accurate classification and real-time assessment capabilities.

  • Research Article
  • Cite Count Icon 5
  • 10.1155/2023/8199000
Hydrogeochemical Characterization and Appraisal of Groundwater Quality in Yisr River Catchment, Blue Nile River Basin, Ethiopia, by Using the GIS, WQI, and Statistical Techniques
  • Apr 18, 2023
  • Journal of Chemistry
  • Abebaw Demelash + 3 more

Groundwater is a primary drinking, agricultural, domestic, and nondomestic water source in Ethiopia’s Yisr River watershed of the Blue Nile River basin. There has been no systematic investigation of the hydrogeochemical properties of groundwater in the research area. The study investigated the hydrogeochemical parameters of groundwater in the catchment to find out if it is fit for drinking and irrigation. A total of 26 samples of groundwater were collected and analyzed for seventeen parameters, including pH, temperature, EC, TDS, TH, K+, Na+, Ca2+, Mg2+, Fe2+, Cl−, HCO 3 − , CO 3 2 − , SO 4 2 − , F−, PO 4 2 − , and NO 3 − . The data were processed and evaluated using integrated hydrogeochemical techniques, including individual ionic signatures, interionic ratios, and multivariate statistical methods, such as multiple correlation analysis, principal component analysis, and hierarchical cluster analysis. The water quality index (WQI) and Na%, PI, RSC, SAR, EC, TDS, and MH were used to judge the quality of water for drinking and irrigation, respectively. The box plot diagram shows the dominant ions in descending order of Ca2+ > Mg2+ > Na+ > K+ and HCO 3 2 − > Cl- > SO 4 2 − > NO 3 − > F− for cations and anions, respectively. The chemical composition of shallow wells and springs indicates freshwater. At the same time, the deep groundwater wells are brackish. The two-factor loadings (principal component analysis) were used to explain the existence of anthropogenic and geogenic sources. Three clusters are identified in the dendrogram. The third cluster has the most significant linkage distance among all the clusters. This means that the groundwater sample in this cluster is geochemically different from the other two clusters, and that this cluster is made up of only deep wells. Water quality indices showed that water quality ranged from excellent to very poor, with the majority (53.85%) being excellent and 26.9% being good. The results of the calculated indices for agricultural water quality indicated that the water quality in most collected samples was in the good and excellent categories; however, the EC, RSC, MH, and TDS indices in deep groundwater wells were found to be hazardous. The findings of this study are useful for understanding groundwater sustainability for various reasons. However, they are also helpful in supporting water management and protection in the future.

  • Book Chapter
  • Cite Count Icon 4
  • 10.5772/9367
Non-Linear Feature Extraction by Linear Principal Component Analysis Using Local Kernel
  • Feb 1, 2010
  • Kazuhiro Hotta

In the last decade, the effectiveness of kernel-based methods for object detection and recognition have been reported Fukui et al. (2006); Hotta (2008c); Kim et al. (2002); Pontil & Verri (1998); Shawe-Taylor & Cristianini (2004); Yang (2002). In particular, Kernel Principal Component Analysis (KPCA) took the place of traditional linear PCA as the first feature extraction step in various researches and applications. KPCA can cope with non-linear variations well. However, KPCAmust solve the eigen value problem with the number of samples × the number of samples. In addition, the computation of kernel functions with all training samples are required to map a test sample to the subspace obtained by KPCA. Therefore, the computational cost is the main drawback. To reduce the computational cost of KPCA, sparse KPCA Tipping (2001) and the use of clustering Ichino et al. (2007 (in Japanese) were proposed. Ichino et al. Ichino et al. (2007 (in Japanese) reported that KPCA of cluster centers is more effective than sparse KPCA. However, the computational cost becomes a big problem again when the number of classes is large and each class has one subspace. For example, KPCA of visual words (cluster centers of local features) Hotta (2008b) was effective for object categorization but the computational cost is high. In this method, each category of 101 categories has one subspace constructed by 400 visual words. Namely, 40, 400 (= 101 categorizes × 400 visual words) kernel computations are required to map a local feature to all subspaces. On the other hand, traditional linear PCA is independent of the number of samples when the dimension of a feature is smaller than the number of samples. This is because the size of eigen value problem depends on the minimum number of the feature dimension and the number of samples. To map a test sample to a subspace, only inner products between basis vectors and the test sample are required. Therefore, in general, the computational cost of linear PCA is much lower than KPCA. In this paper, we propose how to use non-linearity of KPCA and computational cost of linear PCA simultaneously Hotta (2008a). Kernel-based methods map training samples to high dimensional space as x → φ(x). Nonlinearity is realized by linear method in high dimensional space. The dimension of mapped feature space of the Radial Basis Function (RBF) kernel becomes infinity, and we can not describe the mapped feature explicitly. However, the mapped feature φ(x) of the polynomial kernel can be described explicitly. This means that KPCA with the polynomial kernel can be solved directly by linear PCA of mapped features. Unfortunately, in general, the dimension of mapped features is too high to solve by linear PCA even if the polynomial kernel with 2nd degrees K(x, y) = (1+ xTy)2 is used. The dimension of mapped features of the polynomial 5

  • Research Article
  • Cite Count Icon 44
  • 10.1016/j.chemosphere.2023.139083
Fluoride and nitrate enrichment in coastal aquifers of the Eastern Province, Saudi Arabia: The influencing factors, toxicity, and human health risks
  • Jun 16, 2023
  • Chemosphere
  • S.I Abba + 5 more

Fluoride and nitrate enrichment in coastal aquifers of the Eastern Province, Saudi Arabia: The influencing factors, toxicity, and human health risks

  • Research Article
  • Cite Count Icon 35
  • 10.1007/s11356-021-16343-7
Multivariate statistics and entropy theory for irrigation water quality and entropy-weighted index development in a subtropical urban river, Bangladesh.
  • Sep 7, 2021
  • Environmental Science and Pollution Research
  • Md Abu Bakar Siddique + 11 more

Currently, a well-developed combination of irrigation water quality index (IWQIs) and entropy water quality index (EWQIs) for surface water appraisal in a polluted subtropical urban river is very scarce in the literature. To close this gap, we developed IWQIs by establishing statistics-based weights of variables recommended by FAO 29 standard value using the National Sanitation Foundation Water Quality Index (NSFWQI) compared with the proposed EWQIs based on information entropy in the Dhaleshwari River, Bangladesh. Fifty surface water samples were collected from five sampling locations during the dry and wet seasons and analyzed for sixteen variables. Principal component analysis (PCA), factor analysis (FA), Moran's spatial autocorrelation, and random forest (RF) model were employed in the datasets. Weights were allocated for primary variables to compute IWQI-1, 2 and EWQI-1, 2, respectively. The resultant IWQIs showed a similar trend with EWQIs and revealed poor to good quality water, with IWQI-1 for the dry season and IWQI-2 for the wet season is further suggested. The entropy theory recognized that Mg2+, Cr, TDS, and Cl- for the dry season and Cd, Cr, Cl-, and SO42- for the wet season are the major contaminants that affect irrigation water quality. The primary input variables were lessened to ultimately shortlisted ten variables, which revealed good performance in demonstrating water quality status since weights have come effectively from PCA than FA. The results of the RF model depict NO3-, Mg2+, and Cr as the most predominant variables influencing surface water quality. A significant dispersed pattern was detected for IWQImin-3 in the wet season (Moran's I>0). Overall, both IWQIs and EWQIs will generate water quality control cost-effective, completely objective to establish a scientific basis of sustainable water management in the study basin.

  • Research Article
  • Cite Count Icon 4
  • 10.1007/s11356-024-33814-9
Groundwater quality appraisal and zone mapping for agriculture utilities in Wadi Fatima, Saudi Arabia using water quality indices, boron and trace metals.
  • Jun 5, 2024
  • Environmental science and pollution research international
  • Burhan A M Niyazi + 3 more

Groundwater quality in Wadi Fatimah is evaluated and demarcated for agriculture utilities using comprehensive approaches namely, international standards, agricultural water quality (AWQ) indices, irrigation water quality index (IWQI), and trace metals. Groundwater samples were collected (n = 59) and analysed for EC, pH, major and minor ions and trace metals. According to FAO recommendations, 42% of samples (EC > 3000 µS/cm) are inappropriate for agricultural uses. AWQ indices including salinity hazard, Kelly's ratio and Na% show that 50%, 19% and 37% of samples, respectively, are unsuitable for agricultural uses. USSL classification reveals that groundwater is preferable only for high-permeability soils and salt-tolerant crops. IWQI suggests that 88% of samples are moderately usable for agriculture. The interrelationship between water salinity and crop yield justified that 73%, 59%, 51% and 25% of samples are desirable to yield 90% in date palm trees, sorghum, rice and citrus fruits, respectively. Groundwater is appropriate for date palm trees except in downstream regions. Boron concentration suggests that 52%, 81% and 92% of samples are suitable for sensitive, semi-tolerant and tolerant crops, respectively. Groundwater in the central part (suitable for sensitive crops), central and upstream regions (semi-tolerant crops) and all regions except downstream (tolerant crops) are suitable for cultivation. Trace metals contents illustrate that 36%, 34%, 22%, 8%, 5% and 100% of samples are inappropriate for agriculture due to high concentrations of Cr, Cu, Ni, V, Mn and Mo, respectively in the groundwater. Further, AWQ indices, IWQI, USSL classifications and trace metals ensure that groundwater in the downstream, and a few pockets in the upstream are unfit for agricultural uses. This study recommends that groundwater in this basin is more suitable for tolerant crops (ie. date palm, sorghum) followed by semi-tolerant and sensitive crops.

  • PDF Download Icon
  • Research Article
  • Cite Count Icon 104
  • 10.3390/w14030483
Groundwater Suitability for Drinking and Irrigation Using Water Quality Indices and Multivariate Modeling in Makkah Al-Mukarramah Province, Saudi Arabia
  • Feb 6, 2022
  • Water
  • Maged El Osta + 4 more

Water shortage and quality are major issues in many places, particularly arid and semi-arid regions such as Makkah Al-Mukarramah province, Saudi Arabia. The current work was conducted to examine the geochemical mechanisms influencing the chemistry of groundwater and assess groundwater resources through several water quality indices (WQIs), GIS methods, and the partial least squares regression model (PLSR). For that, 59 groundwater wells were tested for different physical and chemical parameters using conventional analytical procedures. The results showed that the average content of ions was as follows: Na+ > Ca2+ > Mg 2+ > K+ and Cl− > SO42− > HCO32− > NO3− > CO3−. Under the stress of evaporation and saltwater intrusion associated with the reverse ion exchange process, the predominant hydrochemical facies were Ca-HCO3, Na-Cl, mixed Ca-Mg-Cl-SO4, and Na-Ca-HCO3. The drinking water quality index (DWQI) has indicated that only 5% of the wells were categorized under good to excellent for drinking while the majority (95%) were poor to unsuitable for drinking, and required appropriate treatment. Furthermore, the irrigation water quality index (IWQI) has indicated that 45.5% of the wells were classified under high to severe restriction for agriculture, and can be utilized only for high salt tolerant plants. The majority (54.5%) were deemed moderate to no restriction for irrigation, with no toxicity concern for most plants. Agriculture indicators such as total dissolved solids (TDS), potential salinity (PS), sodium absorption ratio (SAR), and residual sodium carbonate (RSC) had mean values of 2572.30, 33.32, 4.84, and −21.14, respectively. However, the quality of the groundwater in the study area improves with increased rainfall and thus recharging the Quaternary aquifer. The PLSR models, which are based on physicochemical characteristics, have been shown to be the most efficient as alternative techniques for determining the six WQIs. For instance, the PLSR models of all IWQs had determination coefficients values of R2 ranging between 0.848 and 0.999 in the Cal., and between 0.848 and 0.999 in the Val. datasets, and had model accuracy varying from 0.824 to 0.999 in the Cal., and from 0.817 to 0.989 in the Val. datasets. In conclusion, the combination of physicochemical parameters, WQIs, and multivariate modeling with statistical analysis and GIS tools is a successful and adaptable methodology that provides a comprehensive picture of groundwater quality and governing mechanisms.

  • Conference Article
  • Cite Count Icon 1
  • 10.1109/wcica.2012.6359194
A soft sensor method based on Integrated PCA
  • Jul 1, 2012
  • Weiming Shao + 1 more

Feature extraction methods such as Kernel Principal Component Analysis (KPCA) and Principal Component Analysis (PCA), are often used for soft sensor modeling in industrial process with high dimensional data. A kind of soft sensor method based on Integrated Principal Component Analysis (Integrated PCA) is proposed for some weakness of KPCA and that of PCA. This approach combines nonlinear information extracted by KPCA with linear information extracted by PCA and it can not only reduce the dimensionality of input data, but also make full use of linear and nonlinear information. Partial Least Squares (PLS) is used to obtain the final soft sensor model and Particle Swarm Optimization (PSO) is applied to get the optimal parameters of Integrated PCA and those of KPCA. Finally, the proposed method is applied to build soft sensor models of diesel oil boiling point and other industrial objects and is proved to have better ability of generalization by being compared with other feature extraction methods.

  • Research Article
  • Cite Count Icon 60
  • 10.1007/s10661-022-09845-5
Geospatial assessment of water quality using principal components analysis (PCA) and water quality index (WQI) in Basho Valley, Gilgit Baltistan (Northern Areas of Pakistan).
  • Feb 7, 2022
  • Environmental Monitoring and Assessment
  • Syeda Urooj Fatima + 6 more

Public health quality in Gilgit Baltistan (GB) is at threat due to multiple water-borne diseases. Anthropogenic activities are accelerating the burden of pollution load on the glacio-fluvial streams and surface water resources of Basho Valley in Skardu district of GB. The present research has investigated the drinking water quality of the Basho Valley that is being used for domestic purposes. The study also comprehends public health status by addressing the basis drinking water quality parameters. A total of 23 water samples were collected and then analyzed to elucidate the current status of physico-chemical, metals, and microbial parameters. Principal component analysis (PCA) was applied and three principal components were obtained accounting 53.04% of the total variance, altogether. PCA identified that metallic and microbial parameters are the major factor to influence the water quality of the valley. Meanwhile, water quality index (WQI) was also computed and it was observed that WQI of the valley is characterized as excellent in terms of physico-chemical characteristics; however, metals and microbial WQI shows most of the samples are unfit for drinking purpose. Spatial distribution is also interpolated using the Inverse distance weight (IDW) to anticipate the results of mean values of parameters and WQI scores. The study concludes that water quality is satisfactory in terms of physico-chemical characteristics; however, analysis of metals shows that the concentrations of copper (Cu) (0.40 ± 0.16mg/L), lead (Pb) (0.24 ± 0.10mg/L), zinc (Zn) (6.77 ± 27.1mg/L), manganese (Mn) (0.19 ± 0.05), and molybdenum (Mo) (0.07 ± 0.02mg/L) are exceeding the maximum permissible limit as set in the WHO guidelines for drinking water. Similarly, the results of the microbial analysis indicate that the water samples are heavily contaminated with fecal pollution (TCC, TFC, and TFS > 3 MPN/100mL). On the basis of PCA, WQI, and IDW, the main sources of pollution are most likely to be concluded as the anthropogenic activities including incoming pollution load from upstream channels. A few underlying sources by natural process of weathering and erosion may also cause release of metals in surface and groundwater. This study recommends ensuring public health with regular monitoring and assessment of water resources in the valley.

  • Research Article
  • Cite Count Icon 164
  • 10.1007/s12665-016-5823-y
Assessment of groundwater quality of Lakshimpur district of Bangladesh using water quality indices, geostatistical methods, and multivariate analysis
  • Jun 1, 2016
  • Environmental Earth Sciences
  • Mohammad Amir Hossain Bhuiyan + 5 more

Groundwater evaluation indices, multivariate statistical techniques, and geostatistical models are applied to assess the source apportionment and spatial variability of groundwater pollutants at the Lakshimpur district of Bangladesh. A total of 70 groundwater samples have been collected from wells (shallow to deep wells, i.e., 10–375 m) from the study area. Groundwater quality index reveals that 50 % of the water samples belong to good-quality water. The degrees of contamination, heavy metal pollution index, and heavy metal evaluation index present diversified results in samples even though they show significant correlations among them. The results of principal component analysis (PCA) show that groundwater quality in the study area mainly has geogenic (weathering and geochemical alteration of source rock) sources followed by anthropogenic source (agrogenic, domestic sewage, etc.). Cluster analysis and correlation matrix also supported the results of PCA. The Gaussian semivariogram models have been tested as the best fit models for most of the water quality indices and PCA components. The results of semivariogram models have shown that most of the variables have weak spatial dependence, indicating agricultural and residential/domestic influences. The spatial distribution maps of water quality parameters have provided a useful and robust visual tool for decision makers toward defining adaptive measures. This study is an implication to show the multiple approaches for quality assessment and spatial variability of groundwater as an effort toward a more effective groundwater quality management.

  • Research Article
  • 10.1007/s10653-025-02806-0
Surface water quality evaluation impacting drinking water sources and sanitation using water quality index, multivariate techniques, and interpretable machine learning models in Mahanadi River, Odisha (India).
  • Oct 14, 2025
  • Environmental geochemistry and health
  • Abhijeet Das

Water quality and quantity affect crop productivity, with surface water quality having a significant impact. The amount of surface water being used for drinking is gradually rising. Thus, assessing surface water quality and related hydro-chemical characteristics is essential for surface water resource management in Mahanadi River Basin, Odisha. The current study examined surface water quality and appropriateness for drinking and agriculture, utilizing several techniques such as Weighted Arithmetic (WA) Water Quality Index (WQI), Multivariate models namely Pearson Correlation, Cluster Analysis (CA) and Principal Component Analysis (PCA), six multiple machine learning (ML) techniques like, gaussian process regression (GPR), linear regression (Stepwise), fit binary tree (FBT), support vector regression, SVM (linear and polynomial kernels), and artificial neural network (ANN) to predict the WQI, for sustainable use of the surface water resources. Thirteen physicochemical parameters were used to analyse eleven surface water samples, which indicating that the primary cation and anion concentrations were as follows: Mg2+ > Ca2+ > K+ > Na+, and HCO3- > Cl- > SO42- > NO3-, respectively. The best input combination for WQI model prediction was identified using subset regression analysis. These eight input combinations had high R2, ranging from 0.975 to 1, and high Adjusted R2 amounts to 0.974-1. The WAWQI range is divided into five categories: excellent (18.18%), good (18.18%), poor (27.27%), very poor (27.27%), and unsuitable (9.09%). The study discovered that increased turbidity concentration, carbonate weathering, and the growth of agricultural and urban-industrial sectors regulate the geographical variance in surface water quality. The correlation results depict that the significant positive correlation has been found between TDS and TH (0.87), Mg2+ with turbidity (0.84) and coliform (0.78), Ca2+ and coliform (0.72), Cl- and HCO3- (0.83), and K+ and Na+ (0.7). Owing to the correlation study, these ions are enriched in the surface water by major anthropogenic activity. While, in the present study, CA and PCA has been used to determine the surface water's governing factors. Differentiation of three clusters based on the sources, hydrogeochemical environment, and reactions between chemical variables by utilizing CA and the results of PCA shows that the first three primary components (PCs) account for 84.76% of the overall variation. Hence, CA and PCA shows the several processes that are the main sources of the ions, such as carbonate, silicate weathering, and evaporate dissolution. Pursuant to the stepwise fitting model, bicarbonate was a non-significant variable for the WQI, whereas turbidity, pH, and coliform were the most significant factors. With a high correlation of 1 and low errors, the results demonstrated that the GPR, stepwise linear regression, and ANN models outperformed the others during the training and testing phases.In contrast, during the training and testing stages, the SVM and FBT models showed the lowest performance. Therefore, the GPR, stepwise regression, and ANN models exhibited low mistakes and a strong correlation during the training and testing phases. In conclusion, the combination of physicochemical characteristics, WQI, CA, PCA, and ML tools to assess the surface water suitability for drinking and irrigation and their regulating variables are beneficial and provides a clear picture of water quality. Future research should improve the data accuracy to increase model precision and extend its applicability to various geographical and environmental settings.

More from: Scientific Reports
  • New
  • Research Article
  • 10.1038/s41598-025-24936-2
Proactive identification of cybersecurity compromises via the PROID compromise assessment framework.
  • Nov 7, 2025
  • Scientific reports
  • Abdulaziz Abdullah Alkhalaf + 3 more

  • New
  • Research Article
  • 10.1038/s41598-025-05663-0
Microbiological and pharmacological investigation of phytochemicals extracted from selected ethnomedicinal plants with their potential against food pathogen.
  • Nov 7, 2025
  • Scientific reports
  • Aya M Abdel Gawad + 6 more

  • New
  • Research Article
  • 10.1038/s41598-025-25573-5
Unsupervised spectra information extraction using physics-informed neural networks in the presence of non-linearities and multi-agent problems.
  • Nov 7, 2025
  • Scientific reports
  • Alessandro Puleio + 1 more

  • New
  • Research Article
  • 10.1038/s41598-025-25911-7
Regional distribution and isotope ratios of radiocesium from the Fukushima Daiichi nuclear power station and global fallout in Tokai-mura.
  • Nov 7, 2025
  • Scientific reports
  • Asako Shimada + 5 more

  • New
  • Research Article
  • 10.1038/s41598-025-26058-1
Nature-inspired swarm optimization paradigms for securing semantic web frameworks against DDoS attacks: a computational approach.
  • Nov 7, 2025
  • Scientific reports
  • Chirag Ganguli + 3 more

  • New
  • Research Article
  • 10.1038/s41598-025-26478-z
Unifying graph neural networks causal machine learning and conformal prediction for robust causal inference in rail transport systems.
  • Nov 7, 2025
  • Scientific reports
  • Mehmet Taciddin Akçay

  • New
  • Research Article
  • 10.1038/s41598-025-26168-w
Density functional theory study of mechanical, thermal, and thermodynamic properties of zinc-blende CdS and CdSe.
  • Nov 7, 2025
  • Scientific reports
  • Teshome Gerbaba Edossa

  • New
  • Research Article
  • 10.1038/s41598-025-25891-8
P2X7 and inflammatory fingerprinting of patients with carotid atherosclerosis and the risk of abdominal aortic aneurysm.
  • Nov 7, 2025
  • Scientific reports
  • Maria Lombardi + 8 more

  • New
  • Research Article
  • 10.1038/s41598-025-25690-1
Machine learning and bayesian network based on fuzzy AHP framework for risk assessment in process units.
  • Nov 7, 2025
  • Scientific reports
  • Hassan Mandali + 5 more

  • New
  • Research Article
  • 10.1038/s41598-025-23455-4
Multi-output deep learning for high-frequency prediction of air and surface temperature in Kuwait.
  • Nov 7, 2025
  • Scientific reports
  • Shehroz S Khan + 1 more

Save Icon
Up Arrow
Open/Close
  • Ask R Discovery Star icon
  • Chat PDF Star icon

AI summaries and top papers from 250M+ research sources.

Search IconWhat is the difference between bacteria and viruses?
Open In New Tab Icon
Search IconWhat is the function of the immune system?
Open In New Tab Icon
Search IconCan diabetes be passed down from one generation to the next?
Open In New Tab Icon