Abstract

Severe haze episodes have periodically occurred in Southeast Asia, specifically taunting Malaysia with adverse effects. A technique called cluster analysis was used to analyze these occurrences. Traditional cluster analysis, in particular, hierarchical agglomerative cluster analysis (HACA), was applied directly to data sets. The data sets may contain hidden patterns that can be explored. In this paper, this underlying information was captured via persistent homology, a topological data analysis (TDA) tool, which extracts topological features including components, holes, and cavities in the data sets. In particular, an improved version of HACA was proposed by combining HACA and persistent homology. Additionally, a comparative study between traditional HACA and improved HACA was done using particulate matter data, which was the major pollutant found during haze episodes by the Klang, Petaling Jaya, and Shah Alam air quality monitoring stations. The effectiveness of these two clustering approaches was evaluated based on their ability to cluster the months according to the haze condition. The results showed that clustering based on topological features via the improved HACA approach was able to correctly group the months with severe haze compared to clustering them without such features, and these results were consistent for all three locations.

Highlights

  • Haze occurs in Southeast Asia including Malaysia almost every year

  • Air pollution in Malaysia has been dominated by the occurrence of haze episodes and has caused negative health impacts to humans such as asthma attacks, chronic bronchitis, and acute respiratory infection [3]

  • Severe haze episodes were reported by the Department of Environment (DOE) Malaysia in 2005, 2013, 2014, and 2015 [6]

Read more

Summary

Introduction

Haze occurs in Southeast Asia including Malaysia almost every year. It is a phenomenon related to the weather where there is a presence of solid and liquid particles, smoke, and vapor in the atmosphere, which leads to an atmospheric visibility of less than 10 km [1,2]. Several studies used HACA to cluster air quality data based days, months, and years of air pollution episodes. Mutalib et al [18] used HACA on air pollutant parameters to cluster them by months and years and validated the times of the haze episodes occurred. To the best of our knowledge, the exploration of cluster analysis with topological information, HACA with persistent homology in air quality studies, has not been done. The similarity of relationships among the observations in the data sets can be obtained using hierarchical agglomerative clustering analysis (HACA) [41]. The rule for choosing the two closest members was still applied after new sets of distance were produced This process was repeated until one cluster was formed containing all the observations.

Persistent Homology
HACA with Persistent Homology
Results and Discussions
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.