Abstract

Number concentration is an important index to measure atmospheric particle pollution. However, tailored methods for data preprocessing and characteristic and source analyses of particle number concentrations (PNC) are rare and interpreting the data is time-consuming and inefficient. In this method-oriented study, we develop and investigate some techniques via flexible conditions, C++ optimized algorithms, and parallel computing in R (an open source software for statistics and graphics) to tackle these challenges. The data preprocessing methods include deletions of variables and observations, outlier removal, and interpolation for missing values (NA). They do better in cleaning data and keeping samples and generate no new outliers after interpolation, compared with previous methods. Besides, automatic division of PNC pollution events based on relative values suites PNC properties and highlights the pollution characteristics related to sources and mechanisms. Additionally, basic functions of k-means clustering, Principal Component Analysis (PCA), Factor Analysis (FA), Positive Matrix Factorization (PMF), and a newly-introduced model NMF (Non-negative Matrix Factorization) were tested and compared in analyzing PNC sources. Only PMF and NMF can identify coal heating and produce more explicable results, meanwhile NMF apportions more distinctly and runs 11–28 times faster than PMF. Traffic is interannually stable in non-heating periods and always dominant. Coal heating's contribution has decreased by 40%–86% in recent 5 heating periods, reflecting the effectiveness of coal burning control.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.