Dunn Index Research Articles

Abstract. Long-term measurements of particle number size distribution (PNSD) produce a very large number of observations and their analysis requires an efficient approach in order to produce results in the least possible time and with maximum accuracy. Clustering techniques are a family of sophisticated methods that have been recently employed to analyse PNSD data; however, very little information is available comparing the performance of different clustering techniques on PNSD data. This study aims to apply several clustering techniques (i.e. K means, PAM, CLARA and SOM) to PNSD data, in order to identify and apply the optimum technique to PNSD data measured at 25 sites across Brisbane, Australia. A new method, based on the Generalised Additive Model (GAM) with a basis of penalised B-splines, was proposed to parameterise the PNSD data and the temporal weight of each cluster was also estimated using the GAM. In addition, each cluster was associated with its possible source based on the results of this parameterisation, together with the characteristics of each cluster. The performances of four clustering techniques were compared using the Dunn index and Silhouette width validation values and the K means technique was found to have the highest performance, with five clusters being the optimum. Therefore, five clusters were found within the data using the K means technique. The diurnal occurrence of each cluster was used together with other air quality parameters, temporal trends and the physical properties of each cluster, in order to attribute each cluster to its source and origin. The five clusters were attributed to three major sources and origins, including regional background particles, photochemically induced nucleated particles and vehicle generated particles. Overall, clustering was found to be an effective technique for attributing each particle size spectrum to its source and the GAM was suitable to parameterise the PNSD data. These two techniques can help researchers immensely in analysing PNSD data for characterisation and source apportionment purposes.

Read full abstract

Problem statement: Clustering and visualizing high-dimensional dynamic data is a challenging problem. Most of the existing clustering algorithms are based on the static statistical relationship among data. Dynamic clustering is a mechanism to adopt and discover clusters in real time environments. There are many applications such as incremental data mining in data warehousing applications, sensor network, which relies on dynamic data clustering algorithms. Approach: In this work, we present a density based dynamic data clustering algorithm for clustering incremental dataset and compare its performance with full run of normal DBSCAN, Chameleon on the dynamic dataset. Most of the clustering algorithms perform well and will give ideal performance with good accuracy measured with clustering accuracy, which is calculated using the original class labels and the calculated class labels. However, if we measure the performance with a cluster validation metric, then it will give another kind of result. Results: This study addresses the problems of clustering a dynamic dataset in which the data set is increasing in size over time by adding more and more data. So to evaluate the performance of the algorithms, we used Generalized Dunn Index (GDI), Davies-Bouldin index (DB) as the cluster validation metric and as well as time taken for clustering. Conclusion: In this study, we have successfully implemented and evaluated the proposed density based dynamic clustering algorithm. The performance of the algorithm was compared with Chameleon and DBSCAN clustering algorithms. The proposed algorithm performed significantly well in terms of clustering accuracy as well as speed.

Read full abstract

Dunn Index Research Articles

Related Topics

Articles published on Dunn Index

Performance Analysis of Hard and Soft Clustering Approaches For Gene Expression Data

An optimal rough fuzzy clustering algorithm using particle swarm optimisation

Self-Optimal Clustering Technique Using Optimized Threshold Function

Assessment and application of clustering techniques to atmospheric particle number size distribution for the purpose of source apportionment

Optimal Feature Selection from VMware ESXi 5.1 Feature Set

Optimal Feature Selection from VMware ESXi 5.1 Feature Set

Intensity drift removal in LC/MS metabolomics by common variance compensation

A novel utility-based model for identifying the customer value in online shopping

Spatial Data Clustering and Zonation of Earthquake Building Damage Hazard Area

Quality Assessment on Satellite Images based on Internal Criterion Techniques

Comparison of clustering algorithms on generalized propensity score in observational studies: a simulation study

A Comparative Study of Color Image Segmentation Using Hard, Fuzzy,Rough Set Based Clustering Techniques

A proposed IPC-based clustering method for exploiting expert knowledge and its application to strategic planning

Mining Spatio-Temporal Data of Fatal Accident

On the Application of Clustering Techniques for Office Buildings' Energy and Thermal Comfort Classification

강수지역 구분을 위한 최적 자료 전처리 기법 분석

Validation of Clustering Techniques for Student Grouping in Intelligent E-learning Systems

A Density Based Dynamic Data Clustering Algorithm based on Incremental Dataset

A Comparison Study of Validity Indices on Swarm-Intelligence-Based Clustering

효모 마이크로어레이 유전자 발현 데이터에 대한 유전자 선별 및 군집분석

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Dunn Index Research Articles

Related Topics

Articles published on Dunn Index

Performance Analysis of Hard and Soft Clustering Approaches For Gene Expression Data

An optimal rough fuzzy clustering algorithm using particle swarm optimisation

Self-Optimal Clustering Technique Using Optimized Threshold Function

Assessment and application of clustering techniques to atmospheric particle number size distribution for the purpose of source apportionment

Optimal Feature Selection from VMware ESXi 5.1 Feature Set

Optimal Feature Selection from VMware ESXi 5.1 Feature Set

Intensity drift removal in LC/MS metabolomics by common variance compensation

A novel utility-based model for identifying the customer value in online shopping

Spatial Data Clustering and Zonation of Earthquake Building Damage Hazard Area

Quality Assessment on Satellite Images based on Internal Criterion Techniques

Comparison of clustering algorithms on generalized propensity score in observational studies: a simulation study

A Comparative Study of Color Image Segmentation Using Hard, Fuzzy,Rough Set Based Clustering Techniques

A proposed IPC-based clustering method for exploiting expert knowledge and its application to strategic planning

Mining Spatio-Temporal Data of Fatal Accident

On the Application of Clustering Techniques for Office Buildings' Energy and Thermal Comfort Classification

강수지역 구분을 위한 최적 자료 전처리 기법 분석

Validation of Clustering Techniques for Student Grouping in Intelligent E-learning Systems

A Density Based Dynamic Data Clustering Algorithm based on Incremental Dataset

A Comparison Study of Validity Indices on Swarm-Intelligence-Based Clustering

효모 마이크로어레이 유전자 발현 데이터에 대한 유전자 선별 및 군집분석