Abstract

Groundwater chemistry data are normally scarce in remote inland areas. Effective statistical approaches are highly desired to extract important information about hydrochemical processes from the limited data. This study applied a clustering approach based on the Gaussian Mixture Model (GMM) to a hydrochemical dataset of groundwater collected in the middle Heihe River Basin (HRB) of northwestern China. Independent hydrological data were introduced to examine whether the clustering results led to an appropriate interpretation on the hydrochemical processes. The main findings include the following. First, in the middle HRB, although groundwater chemistry reflects primarily a natural salinization process, there are evidence for significant anthropogenic influence such as irrigation and fertilization. Second, the regional hydrological cycle, particularly surface water-groundwater interaction, has a profound and spatially variable impact on groundwater chemistry. Third, the interaction between the regional agricultural development and the groundwater quality is complicated. Overall, this study demonstrates that the GMM clustering can effectively analyze hydrochemical datasets and that these clustering results can provide insights into hydrochemical processes, even with a limited number of observations. The clustering approach introduced in this study represents a cost-effective way to investigate groundwater chemistry in remote inland areas where groundwater monitoring is difficult and costly.

Highlights

  • Multivariate statistics have been widely used to analyze complex and high-dimensional datasets in hydrological research [1,2,3,4]

  • PC1, PC2 and PC3 form the best set of attributes, and the appropriate number of cluster is 6

  • It is clear that the principal component analysis (PCA) effectively reduces the data dimensions from eight to three and significantly improves the overall clustering performance

Read more

Summary

Introduction

Multivariate statistics have been widely used to analyze complex and high-dimensional datasets in hydrological research [1,2,3,4]. Clustering, a robust classification scheme for partitioning a dataset into homogeneous groups [5], is a typical multivariate statistics technique that has been used for numerous hydrological applications, such as rainfall intensity estimation [6], drought frequency analysis [7], stream turbidity predictions [8] and watershed regionalization [9]. Different clustering methods have been used to analyze multivariate hydrochemical groundwater data [1,5,14,15,16,17]. These clustering methods fall into two main categories: heuristic data mining methods and probability model-based methods [6]. Clustering methods can be classified as “crisp” or “hard” methods (i.e., an observation belongs exclusively to a single cluster) and “fuzzy” or “soft”

Methods
Results
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call