A Hierarchical Gamma Mixture Model-Based Method for Classification of High-Dimensional Data

Muhammad Azhar,Joshua Zhexue Huang,Mark Junjie Li

doi:10.3390/e21090906

Abstract

Data classification is an important research topic in the field of data mining. With the rapid development in social media sites and IoT devices, data have grown tremendously in volume and complexity, which has resulted in a lot of large and complex high-dimensional data. Classifying such high-dimensional complex data with a large number of classes has been a great challenge for current state-of-the-art methods. This paper presents a novel, hierarchical, gamma mixture model-based unsupervised method for classifying high-dimensional data with a large number of classes. In this method, we first partition the features of the dataset into feature strata by using k-means. Then, a set of subspace data sets is generated from the feature strata by using the stratified subspace sampling method. After that, the GMM Tree algorithm is used to identify the number of clusters and initial clusters in each subspace dataset and passing these initial cluster centers to k-means to generate base subspace clustering results. Then, the subspace clustering result is integrated into an object cluster association (OCA) matrix by using the link-based method. The ensemble clustering result is generated from the OCA matrix by the k-means algorithm with the number of clusters identified by the GMM Tree algorithm. After producing the ensemble clustering result, the dominant class label is assigned to each cluster after computing the purity. A classification is made on the object by computing the distance between the new object and the center of each cluster in the classifier, and the class label of the cluster is assigned to the new object which has the shortest distance. A series of experiments were conducted on twelve synthetic and eight real-world data sets, with different numbers of classes, features, and objects. The experimental results have shown that the new method outperforms other state-of-the-art techniques to classify data in most of the data sets.

Highlights

The classification of data is an important research topic in the field of data mining [1,2,3,4,5,6]
The link-based method [36] is used to integrate the clustering results generated from each subspace dataset into an object cluster association (OCA) matrix, on which the k-means algorithm is used to produce the ensemble clustering result with the number of clusters identified by the Gamma Mixture Models (GMMs) Tree algorithm
We first used the GMM Tree to find the number of feature strata, and a k-means algorithm to divide the set of features of the dataset into feature strata

Summary

Introduction

The classification of data is an important research topic in the field of data mining [1,2,3,4,5,6]. The major issue with these techniques is the poor performance of classifying high-dimensional data with a large number of classes in terms of classification accuracy and computation cost To solve this key issue of classifying high-dimensional data with a large number of classes, we propose a new Hierarchical Gamma Mixture Model-based Unsupervised Method in this paper. In this hierarchical method, we apply a subspace ensemble approach to deal with this challenging problem by integrating multiple techniques in an innovative solution, named as the Stratified Subspace. We integrate the multiple techniques of stratified sampling, subspace clustering, GMM Tree, k-means, and the link-based approach in an innovative algorithm to solve the challenging problem of classifying the high-dimensional complex data with the curse of dimensionality characteristics and a large number of classes.

Related Work

Overview of GMM Tree

Generation of Feature Strata from the Training Dataset Dtrain

Generation of Subspace Data Sets from the Training Dataset Dtrain

Generation of Clustering Results from Subspace data sets

Generation of Ensemble Clusters from Individual Subspace Clustering Results

Assignment of Class Labels to the Clusters in Ensemble Clustering Result

Experimental Settings

Experimental Results

Conclusions and Future Work

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Entropy	Publication Date: Sep 18, 2019
Citations: 3	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

A Hierarchical Gamma Mixture Model-Based Method for Classification of High-Dimensional Data

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Entropy

Lead the way for us

Similar Papers

Automatic Discovery of Class Hierarchies via Output Space Decomposition
Joydeep Ghosh ... Melba M Crawford
-
Joydeep Ghosh, et. al.Joydeep Ghosh ... Melba M Crawford
01 Jan 2004
01 Jan 2004

Multivariate outlier detection based on self-organizing map and adaptive nonlinear map and its application
Xuefeng Yan
Chemometrics and Intelligent Laboratory Systems | VOL. 107
Xuefeng YanXuefeng Yan
22 Apr 2011
Chemometrics and Intelligent Laboratory Systems | VOL. 107

The Influence of Temporal Information on Human Action Recognition with Large Number of Classes
O V Ramana Murthy ... Roland Goecke
-
O V Ramana Murthy, et. al.O V Ramana Murthy ... Roland Goecke
01 Nov 2014
01 Nov 2014

Adaptive Subspace Optimization Ensemble Method for High-Dimensional Imbalanced Data Classification.
Yuhong Xu ... Zhiwen Yu
IEEE Transactions on Neural Networks and Learning Systems | VOL. 34
Yuhong Xu, et. al.Yuhong Xu ... Zhiwen Yu
01 May 2023
IEEE Transactions on Neural Networks and Learning Systems | VOL. 34

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

A Hierarchical Gamma Mixture Model-Based Method for Classification of High-Dimensional Data

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Entropy