Cluster Density Properties Define a Graph for Effective Pattern Feature Selection

Khadidja Henni,Neila Mezghani,Amar Mitiche

doi:10.1109/access.2020.2981265

Abstract

Feature selection is a challenging problem that occurs in the high-dimensional data analysis of many major applications. It addresses the curse of dimensionality by determining a small set of features to represent high-dimensional data without significant or noticeable loss of information. The purpose of this study is to develop and investigate a new unsupervised feature selection method which uses the k-influence space concept and subspace learning to map features onto a weighted graph and rank them by importance according to the PageRank graph centrality measure. The graph design in this method promotes feature relevance, downgrades redundancy, and is robust to outliers and cluster imbalances. In K-Means classification experiments using the ASU feature selection testing datasets, the method produces better accuracy and normalized mutual information results than state-of-the-art unsupervised feature selection algorithms. In a further evaluation, using a dataset of over 14,000 tweets, conventional classification of features selected by the method gave better sentiment analysis results than deep learning feature selection and classification.

Highlights

Progress in science and technology has allowed the development of applications that use very large data sets of high-dimensional data
The purpose of this study is to investigate a new unsupervised feature selection method, called Influence Space and Graph-based Feature Selection (ISGFS), which uses the k-influence space concept [22]–[24] and subspace learning to describe feature relationships and subsequently design a feature selection graph
The Unsupervised Graph-based Feature Selection (UGFS) performed significantly better than other methods, three drawbacks have been noted: (i) the elements in a k-nearest neighbors set may belong to different clusters, and the search for cluster discriminating features may be unduly affected; (ii)considering all data points for feature combination may unduly change the results, especially if the data set contains outliers; and (iii) the correlation between features is not exploited as a means to disfavor the redundant features [15]

Summary

Introduction

Progress in science and technology has allowed the development of applications that use very large data sets of high-dimensional data. These applications occur in various domains, most notably natural language processing, pattern recognition, and computer vision [1], [2]. The curse of dimensionality is likely to overfit training data, and to produce models that do not generalize to new data which they will fail to interpret [1], [4], [5]. Recent studies have explicitly addressed such issues in various ways [6]–[8], such as dimensionality reduction [9] by feature selection and reduction, done before data analysis, subspace learning to determine data layout and properties to assist clustering [7], classification [10], as well as representation by similarity and kernel functions [6].

Objectives

Methods

Findings

Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: IEEE Access	Publication Date: Jan 1, 2020
Citations: 56	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

Cluster Density Properties Define a Graph for Effective Pattern Feature Selection

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: IEEE Access

Lead the way for us

Similar Papers

Multi-Cluster Feature Selection Based on Isometric Mapping
Yadi Wang ... Yinghao Lin
IEEE/CAA Journal of Automatica Sinica | VOL. 9
Yadi Wang, et. al.Yadi Wang ... Yinghao Lin
01 Mar 2022
IEEE/CAA Journal of Automatica Sinica | VOL. 9

Unsupervised Feature Selection via Unified Trace Ratio Formulation and K-means Clustering (TRACK)
De Wang ... Heng Huang
-
De Wang, et. al.De Wang ... Heng Huang
01 Jan 2014
01 Jan 2014

Unsupervised Feature Selection With Flexible Optimal Graph.
Hong Chen ... Xuelong Li
IEEE Transactions on Neural Networks and Learning Systems | VOL. 35
Hong Chen, et. al.Hong Chen ... Xuelong Li
01 Feb 2024
IEEE Transactions on Neural Networks and Learning Systems | VOL. 35

Sparse Graph Embedding Unsupervised Feature Selection
Shiping Wang ... William Zhu
IEEE Transactions on Systems, Man, and Cybernetics: Systems | VOL. 48
Shiping Wang, et. al.Shiping Wang ... William Zhu
01 Mar 2018
IEEE Transactions on Systems, Man, and Cybernetics: Systems | VOL. 48

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Cluster Density Properties Define a Graph for Effective Pattern Feature Selection

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: IEEE Access