Multi-Objective Feature Selection With Missing Data in Classification

Yu Xue,Yihang Tang,Xin Xu,Jiayu Liang,Ferrante Neri

doi:10.1109/tetci.2021.3074147

Abstract

Feature selection (FS) is an important research topic in machine learning. Usually, FS is modelled as a bi-objective optimization problem whose objectives are: 1) classification accuracy; 2) number of features. One of the main issues in real-world applications is missing data. Databases with missing data are likely to be unreliable. Thus, FS performed on a data set missing some data is also unreliable. In order to directly control this issue plaguing the field, we propose in this study a novel modelling of FS: we include reliability as the third objective of the problem. In order to address the modified problem, we propose the application of the non-dominated sorting genetic algorithm-III (NSGA-III). We selected six incomplete data sets from the University of California Irvine (UCI) machine learning repository. We used the mean imputation method to deal with the missing data. In the experiments, k-nearest neighbors (K-NN) is used as the classifier to evaluate the feature subsets. Experimental results show that the proposed three-objective model coupled with NSGA-III efficiently addresses the FS problem for the six data sets included in this study.

Highlights

A large number of data sets contains a lot of irrelevant or redundant features
The “+” denotes that NSGAIII is significantly better than the comparison approach, the “-” denotes that the comparison approach is significantly better than NSGA-III, and “=” denotes that NSGA-III and comparison approach have similar results
This paper proposes a novel interpretation of Feature selection (FS) problem in data science with a specific reference to data sets with missing data

Summary

INTRODUCTION

A large number of data sets contains a lot of irrelevant or redundant features (useless features). After the application of the mean imputation approach, this paper proposes the modelling of the reliability of the data through a third objective of the multi-objective optimization problem. Unlike the studies in the literature, this paper considers the classification accuracy and solution size, and introduces the missing rate for FS in order to enhance upon the reliability of FS. In this study we employ the mean imputation method in single imputation to interpolate the missing data. We chose this method since it is well-suited to handle large data sets thanks to its low computational complexity and modest execution time, see [40]. Where lmj is the number of missing entries associated with the feature j, N is the total number of all instances

THREE-OBJECTIVE FEATURE SELECTION PROBLEMS AND NSGA-III ALGORITHM

MEAN IMPUTATION METHOD

NSGA-III algorithm

15 Based on the selection of reference points

Fast non-donminated sorting

Selection method based on reference points

Benchmark algorithms and parameter settings

Performance metrics

EXPERIMENTAL SETUP

EXPERIMENTAL RESULTS

CONCLUSION

Full Text

Published version (

Free)

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: IEEE Transactions on Emerging Topics in Computational Intelligence	Publication Date: Apr 1, 2022
Citations: 83	License type: cc-by

R Discovery Prime

R Discovery Prime

Multi-Objective Feature Selection With Missing Data in Classification

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: IEEE Transactions on Emerging Topics in Computational Intelligence

Lead the way for us

Similar Papers

A Machine Learning Methodology for Diagnosing Chronic Kidney Disease
Yuhua Liu ... Lin Chen
IEEE Access | VOL. 8
Yuhua Liu, et. al.Yuhua Liu ... Lin Chen
01 Jan 2020
IEEE Access | VOL. 8

A Machine Learning Methodology for Diagnosing Chronic Kidney Disease
Saraswathi P ... Vidya Shree Ch
International Journal of Advanced Research in Science, Communication and Technology | VOL. -
Saraswathi P, et. al. Saraswathi P ... Vidya Shree Ch
05 May 2023
International Journal of Advanced Research in Science, Communication and Technology | VOL. -

Determining The Severity of Chronic Kidney Disease Using Machine Learning Methodologies
B Kiruthika ... J Prabavadhi
-
B Kiruthika, et. al.B Kiruthika ... J Prabavadhi
30 Jul 2021
30 Jul 2021

A weighted-sum chaotic sparrow search algorithm for interdisciplinary feature selection and data classification
Ahmed Salem ... Ahmed G Gad
Scientific Reports | VOL. 13
Ahmed Salem, et. al.Ahmed Salem ... Ahmed G Gad
28 Aug 2023
Scientific Reports | VOL. 13

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Multi-Objective Feature Selection With Missing Data in Classification

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: IEEE Transactions on Emerging Topics in Computational Intelligence