Jeffries-Matusita distance as a tool for feature selection

Rikta Sen,Basabi Chakraborty,Saptarsi Goswami

doi:10.1109/icdse47409.2019.8971800

Abstract

Feature selection is one of the most important preprocessing steps in Machine Learning. This can be broadly divided into search based methods and ranking based methods. The ranking based methods are very popular because they need much lesser computational power. There can be many different ways to rank the features. One of the ways to measure effectiveness of a feature is by evaluating its ability to separate the classes involved. These interclass Separability based measures can be directly used as a feature ranking tool for binary classification problems. Bhattacharya Distance which is the most popular among them has been used majorly in a recursive setup to select good quality feature subsets. Jeffries-Matusita (JM) distance improves Bhattacharya distance by normalizing it between 0 and 2. In this paper, we have ranked the features based on JM distance. The results are comparable with mutual information, Relief and Chi Squared based measures as per experiments conducted over 24 public datasets but in much lesser time. JM distance also provide some intuition about the dataset prior to any feature selection or machine learning algorithm. A comparison has been done on classification accuracy and JM scores of these datasets, which can provide a good intuition on how good a dataset is for classification and point out the need of or lack of further feature collection.

Full Text