Towards an Unsupervised Feature Selection Method for Effective Dynamic Features

Naif Almusallam,Jeffrey Chan,Abdulatif Alabdulatif,Adil Fahad,Zahir Tari,Mohammed Al-Naeem

doi:10.1109/access.2021.3082755

Abstract

Dynamic features applications present new obstacles for the selection of streaming features. The dynamic features applications have various characteristics: a) features are processed sequentially while the number of instances is fixed; and b) the feature space does not exist in advance. For example, in a text classification task for spam detection, new features (e.g. words) are dynamically generated and therefore need to be mined to filter out the spams rather than waiting for all features to be collected in order to do so. Traditional feature selection methods, which are not designed for streaming features applications, cannot be used in such an environment, as they require the full feature space in advance in order to statistically determine the representative features. Existing methods that address feature selection in dynamic features applications require the class labels in order to select the representative features. However, most of the real-life data is unlabeled and it is costly to apply manual labeling. In this paper, an efficient unsupervised features selection method is proposed for streaming features applications where the number of features increases while the number of instances remains fixed. In particular, unsupervised Feature Selection for Dynamic Features (UFSSF) is developed to determine the representative streaming features without requiring prior knowledge about data class labels or representative features. The UFSSF extends the k-mean clustering to cumulatively determine whether the newly-arrived feature can be selected as a representative streaming feature, or discarded. Experimental results show significant accuracy results and efficient execution time compared to those of other benchmark methods.

Highlights

The high-dimensional data decreases the performance of machine learning algorithms in dynamic features applications
Mitra’s method [13] involves three similarity measures, Least Square Regression Error (LSRE), Pearson Correlation Coefficient (PCC) and Maximal Information Compression Index (MICI)), while SPEC can work with the RBF Kernel similarity measure
This paper developed an unsupervised feature selection method for effective dynamic features which can reduce the dimensionality of streaming features applications, known as the dynamic feature space

Summary

Introduction

The high-dimensional data decreases the performance of machine learning algorithms in dynamic features applications. Feature selection methods have been applied to identify the representative features of data streams to eliminate obstacles related to data dimensionality. The current feature selection methods cannot be applied effectively for streaming features applications when features are arriving sequentially. In the category of streaming data, there is a dynamic number of instances and there is a fixed number of features. The focus is on the streaming features category where there is a fixed number of instances and a dynamic number of features. These features are processed one-by-one upon their arrival. One real-world application that can be categorized as streaming features

Methods

Results

Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: IEEE Access	Publication Date: Jan 1, 2021
Citations: 13	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

Towards an Unsupervised Feature Selection Method for Effective Dynamic Features

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: IEEE Access

Lead the way for us

Similar Papers

An Unsupervised Feature Selection Method for Data-Driven Anomaly Detection Systems
Naif Almusallam
-
Naif AlmusallamNaif Almusallam
01 Sep 2020
01 Sep 2020

Filter unsupervised spectral feature selection method for mixed data based on a new feature correlation measure
Saúl Solorio-Fernández ... José Fco Martínez-Trinidad
Neurocomputing | VOL. 571
Saúl Solorio-Fernández, et. al.Saúl Solorio-Fernández ... José Fco Martínez-Trinidad
12 Dec 2023
Neurocomputing | VOL. 571

An efficient unsupervised feature selection procedure through feature clustering
Xuyang Yan ... Edward Tunstel
Pattern Recognition Letters | VOL. 131
Xuyang Yan, et. al.Xuyang Yan ... Edward Tunstel
03 Jan 2020
Pattern Recognition Letters | VOL. 131

A novel unsupervised feature selection method for bioinformatics data sets through feature clustering
Guangrong Li ... Xiaohua Hu
-
Guangrong Li, et. al. Guangrong Li ... Xiaohua Hu
01 Aug 2008
01 Aug 2008

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Towards an Unsupervised Feature Selection Method for Effective Dynamic Features

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: IEEE Access