A Bootstrap Aggregating Technique on Link-Based Cluster Ensemble Approach for Categorical Data Clustering

S Pavan Kumar Reddy,U Sesadri

doi:10.24297/ijct.v10i8.1468

Abstract

Although attempts have been made to solve the problem of clustering categorical data via cluster ensembles, with the results being competitive to conventional algorithms, it is observed that these techniques unfortunately generate a final data partition based on incomplete information. The underlying ensemble-information matrix presents only cluster-data point relations, with many entries being left unknown. The paper presents an analysis that suggests this problem degrades the quality of the clustering result, and it presents a BSA (Bootstrap Aggregation) is a machine learning ensembleÂ meta-algorithmÂ designed to improve the stability and accuracy along with a new link-based approach, which improves the conventional matrix by discovering unknown entries through similarity between clusters in an ensemble. In particular, an efficient BSA and link-based algorithm is proposed for the underlying similarity assessment. Afterward, to obtain the final clustering result, a graph partitioning technique is applied to a weighted bipartite graph that is formulated from the refined matrix. Experimental results on multiple real data sets suggest that the proposed link-based method almost always outperforms both conventional clustering algorithms for categorical data and well-known cluster ensemble techniques.

Highlights

Bootstrap aggregating is a machine learning ensemble meta-algorithm designed to improve the stability and accuracy of machine learning algorithms used in statistical classification and regression
The method introduced to creates an ensemble by applying a conventional clustering algorithm (e.g., k-modes [8] and COOLCAT [17]) to different data partitions, each of which is constituted by a unique subset of data attributes
The experiments set out to investigate the performance of link-based cluster ensemble (LCE) compared to a number of clustering algorithms, both developed for categorical data analysis and those state-of-the-art cluster ensemble techniques found in literature

Summary

INTRODUCTION

Bootstrap aggregating (bagging) is a machine learning ensemble meta-algorithm designed to improve the stability and accuracy of machine learning algorithms used in statistical classification and regression. The m models are fitted using the above m bootstrap samples and combined by averaging the output (for regression) or voting (for classification). Many well-established clustering algorithms, such as k-means [1] and PAM [2], have been designed for numerical data, whose inherent properties can be naturally employed to measure a distance (e.g., Euclidean) between feature vectors [3], [4]. These cannot be directly applied for clustering of categorical data, where domain values are discrete and have no ordering defined. The initial method was developed in [6] by making use of Gower’s similarity coefficient [7]

Example of Ozone Data

Problem Formulation and General Framework

Ensemble Generation Methods

Cluster Ensembles of Categorical Data

A NOVEL LINK-BASED APPROACH

Creating a Cluster Ensemble

Generating a Refined Matrix

Investigated Data Sets

Experiment Design

Parameter Settings

Bagging nearest neighbor classifiers

Parameter and Complexity Analysis

CONCLUSIONS

REFERENCES:

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: INTERNATIONAL JOURNAL OF COMPUTERS & TECHNOLOGY	Publication Date: Aug 30, 2013
Citations: 29	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

A Bootstrap Aggregating Technique on Link-Based Cluster Ensemble Approach for Categorical Data Clustering

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: INTERNATIONAL JOURNAL OF COMPUTERS & TECHNOLOGY

Lead the way for us

Similar Papers

A Link-Based Cluster Ensemble Approach for Categorical Data Clustering
Natthakan Iam-On ... Tossapon Boongeon
IEEE Transactions on Knowledge and Data Engineering | VOL. 24
Natthakan Iam-On, et. al.Natthakan Iam-On ... Tossapon Boongeon
01 Mar 2012
IEEE Transactions on Knowledge and Data Engineering | VOL. 24

Measurement of similarity using link based cluster approach for categorical data
M Pavithra ... D Chandrakala
-
M Pavithra, et. al.M Pavithra ... D Chandrakala
01 Feb 2013
01 Feb 2013

A cluster ensemble method for clustering categorical data
Zengyou He ... Shengchun Deng
Information Fusion | VOL. 6
Zengyou He, et. al.Zengyou He ... Shengchun Deng
09 Apr 2004
Information Fusion | VOL. 6

Partition-and-merge based fuzzy genetic clustering algorithm for categorical data
Thi Phuong Quyen Nguyen ... R.J Kuo
Applied Soft Computing | VOL. 75
Thi Phuong Quyen Nguyen, et. al.Thi Phuong Quyen Nguyen ... R.J Kuo
19 Nov 2018
Applied Soft Computing | VOL. 75

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

A Bootstrap Aggregating Technique on Link-Based Cluster Ensemble Approach for Categorical Data Clustering

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: INTERNATIONAL JOURNAL OF COMPUTERS &amp; TECHNOLOGY

More From: INTERNATIONAL JOURNAL OF COMPUTERS & TECHNOLOGY