Abstract

Abstract. Atmospheric new-particle formation (NPF) is a very non-linear process that includes atmospheric chemistry of precursors and clustering physics as well as subsequent growth before NPF can be observed. Thanks to ongoing efforts, now there exists a tremendous amount of atmospheric data, obtained through continuous measurements directly from the atmosphere. This fact makes the analysis by human brains difficult but, on the other hand, enables the usage of modern data science techniques. Here, we calculate and explore the mutual information (MI) between observed NPF events (measured at Hyytiälä, Finland) and a wide variety of simultaneously monitored ambient variables: trace gas and aerosol particle concentrations, meteorology, radiation and a few derived quantities. The purpose of the investigations is to identify key factors contributing to the NPF. The applied mutual information method finds that the formation events are strongly linked to sulfuric acid concentration and water content, ultraviolet radiation, condensation sink (CS) and temperature. Previously, these quantities have been well-established to be important players in the phenomenon via dedicated field, laboratory and theoretical research. The novelty of this work is to demonstrate that the same results are now obtained by a data analysis method which operates without supervision and without the need of understanding the physics deeply. This suggests that the method is suitable to be implemented widely in the atmospheric field to discover other interesting phenomena and their relevant variables.

Highlights

  • New-particle formation (NPF) is an important source of aerosol particles and cloud condensation nuclei (CCN) and in a vast number of atmospheric environments ranging from remote continental areas to heavily polluted urban centres (Kulmala et al, 2004; Dunne et al, 2016; Wang et al, 2017)

  • At least a part of the mutual information (MI) method’s appeal comes from its capability to effectively measure non-linear correlation between data sets (Steuer et al, 2002; Chen et al, 2010). In this aspect MI is superior to the standard Pearson correlation coefficient (PCC) (Pearson, 1895), which is only suitable for measuring linear correlation (Wang et al, 2015)

  • There is no specific level for MI or threshold that indicates a correlation between different variables, which is similar to the Pearson correlation, where this correlation value gives an only indication of the variables relationship

Read more

Summary

Introduction

New-particle formation (NPF) is an important source of aerosol particles and cloud condensation nuclei (CCN) and in a vast number of atmospheric environments ranging from remote continental areas to heavily polluted urban centres (Kulmala et al, 2004; Dunne et al, 2016; Wang et al, 2017). The used data were measured from three polluted sites, which are the Po Valley, Italy; and Melpitz and Hohenpeissenberg, Germany In this study, they applied a multivariate non-linear mixed effects model to examine the variables affecting the number concentration of Aitken particles (50 nm). In order to understand the effects of atmospheric variables to NPF in Hyytiälä, Finland, a comprehensive study was done by Hyvönen et al (2005) They utilized two main types of data mining methods on 8 years of continuous measurements of 80 variables. The goal is to find the most relevant atmospheric variables in relation to NPF events using a data-driven information theoretic method based on the data set measured at Hyytiälä, Finland. We utilize the data measured during the years 1996–2014 at the Station for Measuring Forest EcosystemAtmosphere Relations (SMEAR) II station in Hyytiälä, Finland, operated by Helsinki University (SMEAR website, 2017)

Sampling site
Measured variables
Derived variables
Data preprocessing
Information theory: a brief introduction
Entropy
Mutual information
Mutual information implementation: nearest-neighbour method
Mutual information: a simulation case study
Correlation analysis between atmospheric variables and NPF
Scatter plot analysis
Conclusions

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.