Abstract

Nowadays air quality data can be easily accumulated by sensors around the world. Analysis on air quality data is very useful for society decision. Among five major air pollutants which are calculated for AQI (Air Quality Index), PM2.5 data is the most concerned by the people. PM2.5 data is also cross-impacted with the other factors in the air and which has properties of non-linear non-stationary including high noise level and outlier. Traditional methods cannot solve the problem of PM2.5 data clustering very well because of their inherent characteristics. In this paper, a novel model-based feature extraction method is proposed to address this issue. The EPLS model includes: (1) Mode Decomposition, in which EEMD algorithm is applied to the aggregation dataset; (2) Dimension Reduction, which is carried out for a more significant set of vectors; (3) Least Squares Projection, in which all testing data are projected to the obtained vectors. Synthetic dataset and air quality dataset are applied to different clustering methods and similarity measures. Experimental results demonstrate that EPLS is efficient in dealing with high noise level and outlier air quality clustering problems, and which can also be adapted to various clustering techniques and distance measures.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.