Abstract

Data collected in air pollution monitoring such as PM10, sulphur dioxide, ozone and carbon monoxide are obtained from automated monitoring stations. These data usually contained missing values due to machine failure, routine maintenance, changes in the siting of monitors and human error. Incomplete datasets can cause bias due to systematic differences between observed and unobserved data. Therefore, it is important to find the best way to estimate these missing values to ensure the quality of data analysed are of high quality. Incomplete data matrices are problematic: incomplete datasets may lead to results that are different from those that would have been obtained from a complete dataset (Hawthorne and Elliott, 2004). There are three major problems that may arise when dealing with incomplete data. First, there is a loss of information and, as a consequence, a loss of efficiency. Second, there are several complications related to data handling, computation and analysis, due to the irregulaties in data structure and the impossibility of using standard software. Third, and more important, there maybe bias due to systematic differences between observed and unobserved data. One approach to solve incomplete data problems is the adoption of imputation techniques (Junninen et al., 2004). Thus, this study compared the performance between linear interpolation method (imputation technique) and substitution of mean value for replacement of missing values in environmental data set.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call