Abstract

Outliers are often present in large datasets of air pollutant concentrations. Existing methods for detection of outliers in environmental data can be divided as follows into three groups depending on the character of the data: methods for time series, methods for time series measured simultaneously with accompanying variables and methods for spatial data. A number of methods suggested for the automatic detection of outliers in time series data are limited by assumptions of known distribution of the analysed variable. Since the environmental variables are often influenced by accompanying factors their distribution is difficult to estimate. Considering the known information about accompanying variables and using appropriate methods for detection of outliers in time series measured simultaneously with accompanying variables can be a significant improvement in outlier detection approaches. This paper presents a method for the automatic detection of outliers in PM10 aerosols measured simultaneously with accompanying variables. The method is based on generalised linear model and subsequent analysis of the residuals. The method makes use of the benefits from the additional information included in the accessibility of accompanying variables. The results of the suggested procedure are compared with the results obtained using two distribution-free outlier detection methods for time series formerly suggested by the authors. The simulations-based comparison of the performance of all three procedures showed that the procedure presented in this paper effectively detects outliers that deviate at least 5 standard deviations from the mean value of the neighbouring observations and outperforms both distribution-free outlier detection methods for time series.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call