Non-linear visualization and analysis of large water quality data sets: a model-free basis for efficient monitoring and risk assessment

Gunnar Lischeid

doi:10.1007/s00477-008-0266-y

Abstract

Environmental monitoring programs provide large multivariate data sets that usually cover considerable spatial and temporal variabilities. The apparent complexity of these data sets requires sophisticated tools for their processing. Usually, fixed schemes are followed, including the application of numerical models, which are increasingly implemented in decision support systems. However, these schemes are too rigid with respect to detecting unexpected features, like the onset of subtle trends, non-linear relationships or patterns that are restricted to limited sub-samples of the total data set. In this study, an alternative approach is followed. It is based on an efficient non-linear visualization of the data. Visualization is the most powerful interface between computer and human brain. The idea is to apply an efficient and model-free tool, meaning without the necessity of prior assumptions about key properties of the data, such as dominant processes. In other words, processing of the data aimed at preserving a maximum amount of information and to leave it to the expert which features to analyze in more detail. A comprehensive data set from a 15-year monitoring program in the Lehstenbach watershed was used. The watershed is located in the Fichtelgebirge area, a mountainous region in South Germany, where land-use is forestry. Streamwater and groundwater have been monitored at 38 sampling sites, comprising 13 parameters. The data set was analyzed using a self-organizing map (SOM), combined with Sammon’s mapping. The 2D non-linear projection represented 89% of the variance of the data set. The visualization of the data set enabled an easy detection of outliers, assessing spatial versus temporal variance, and verifying a predefined classification of the sampling sites. Contamination of two of the observation wells was detected. Long-term trends of solute concentration in the catchment runoff could be differentiated from short-term dynamics, and a long-term shift in the dynamics was determined for different flow regimes individually. This analysis helped considerably to better understand the system’s behavior, to detect “hot spots” and to organize subsequent analyses of the data in a very efficient way.

Full Text