Finding causes of outliers in multivariate environmental data

Forest C Garner,Kirk E Fitzgerald,Martin A Stapanian

doi:10.1002/cem.1180050311

Abstract

AbstractMultivariate outliers in environmental data sets are often caused by atypical measurement error in a single variable. From a quality assurance perspective it is important to identify these variables efficiently so that corrective actions may be performed. We demonstrate a procedure for using two multivariate tests to identify which variable ‘caused’ each outlier. The procedure is tested with simulated data sets have have the same correlation structure as selected water chemistry variables from a survey of lakes in the Western United States. The success rates are evaluated for three of the variables for sample sizes of 50 and 100, significance levels of 0.01 and 0.05 and various amounts of mean shift. The procedure works best for highly correlated variables.

Full Text