Feature Selection Based on Divergence Functions: A Comparative Classiffication Study

Saeid Pourmand,Mojtaba Ganjali,Ashkan Shabbak

doi:10.19139/soic-2310-5070-1092

Abstract

Due to the extensive use of high-dimensional data and its application in a wide range of scientifc felds of research, dimensionality reduction has become a major part of the preprocessing step in machine learning. Feature selection is one procedure for reducing dimensionality. In this process, instead of using the whole set of features, a subset is selected to be used in the learning model. Feature selection (FS) methods are divided into three main categories: flters, wrappers, and embedded approaches. Filter methods only depend on the characteristics of the data, and do not rely on the learning model at hand. Divergence functions as measures of evaluating the differences between probability distribution functions can be used as flter methods of feature selection. In this paper, the performances of a few divergence functions such as Jensen-Shannon (JS) divergence and Exponential divergence (EXP) are compared with those of some of the most-known flter feature selection methods such as Information Gain (IG) and Chi-Squared (CHI). This comparison was made through accuracy rate and F1-score of classifcation models after implementing these feature selection methods.

Highlights

In recent years, dealing with high dimensional data has grown into a big part of machine learning and statistics including classification problems
We review some Feature selection (FS) methods that are compatible with this description as well as Information Gain (IG) and CHI
Based on Figures (1-7) and Figure 8, it can be inferred that there is a diversity in selected features based on each FS method, there is a similarity in the number of features selected most of the time

Summary

Introduction

In recent years, dealing with high dimensional data has grown into a big part of machine learning and statistics including classification problems. There are multiple ways to perform FS, but in general, this procedure is classified into three main categories [11]: filters, wrappers, and embedded methods. It can be seen from Eq(1) that if the joint probability distribution p(x, y) and product of marginal distributions of X and Y i.e. p(x)p(y) were close to each other, IG(Y, X) would get close to zero This means that we gain little information about Y provided that X is observed. Similar to Eq(1), in Eq(3) if p(xi, yj) and p(xi)p(yj) for all i and j were close to each other, CHI(X, Y ) tends to zero This method is based on another divergence function called Kagan’s Divergence [26] that can be formulated.

Related works

Divergence functions

Bregman’s divergences

Experimental Design

Average number of selected features

F1-Score

Conclusion

Full Text

Published version (

Free)

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Statistics, Optimization & Information Computing	Publication Date: Jul 10, 2021
Citations: 1	License type: cc-by

R Discovery Prime

R Discovery Prime

Feature Selection Based on Divergence Functions: A Comparative Classiffication Study

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Statistics, Optimization & Information Computing

Lead the way for us

Similar Papers

Research on Feature Selection and kNN Classification Method in Chinese Text Classification
Chao Xiao ... Ping Wu
-
Chao Xiao, et. al.Chao Xiao ... Ping Wu
01 Jan 2015
01 Jan 2015

Integration of graph clustering with ant colony optimization for feature selection
Parham Moradi ... Mehrdad Rostami
Knowledge Based Systems | VOL. 84
Parham Moradi, et. al.Parham Moradi ... Mehrdad Rostami
09 Apr 2015
Knowledge Based Systems | VOL. 84

Identification of clinical factors related to prediction of alcohol use disorder from electronic health records using feature selection methods
Ali Ebrahimi ... Amin Naemi
BMC medical informatics and decision making | VOL. 22
Ali Ebrahimi, et. al.Ali Ebrahimi ... Amin Naemi
23 Nov 2022
BMC medical informatics and decision making | VOL. 22

A Feature and Algorithm Selection Method for Improving the Prediction of Protein Structural Class.
Qianwu Ni ... Lei Chen
Combinatorial chemistry & high throughput screening | VOL. 20
Qianwu Ni, et. al.Qianwu Ni ... Lei Chen
23 Oct 2017
Combinatorial chemistry & high throughput screening | VOL. 20

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Feature Selection Based on Divergence Functions: A Comparative Classiffication Study

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Statistics, Optimization &amp; Information Computing

More From: Statistics, Optimization & Information Computing