Abstract

Feature subset selection is an effective approach used to select a compact subset of features from the original set. This approach is used to remove irrelevant and redundant features from datasets. In this paper, a novel algorithm is proposed to select the best subset of features based on mutual information and local non-uniformity correction estimator. The proposed algorithm consists of three phases: in the first phase, a ranking function is used to measure the dependency and relevance among features. In the second phase, candidates with higher dependency and minimum redundancy are selected to participate in the optimal subset. In the last phase, the produced subset is refined using forward and backward wrapper filter to ensure its effectiveness. A UCI machine repository datasets are used for validation and testing. The performance of the proposed algorithm has been found very significant in terms of classification accuracy and time complexity.

Highlights

  • In many applications of machine learning, the number of samples and dimensions of most datasets have grown rapidly [1]

  • Since the computational power, processing time and classification accuracy depend on the size of data reducing the dataset represents a challenge for researches

  • Feature subset selection provides an approach for dimensions reduction and data minimization by replacing the original set of features with a compact subset that acts similar to the original one

Read more

Summary

INTRODUCTION

In many applications of machine learning, the number of samples and dimensions of most datasets have grown rapidly [1]. Feature subset selection provides an approach for dimensions reduction and data minimization by replacing the original set of features with a compact subset that acts similar to the original one. Feature subset selection is categorized into two main approaches in terms of evaluation strategy [1]: First, the wrapper approach which depends on searching the whole search space to find the optimal subset [9] This approach finds every combination of subsets to determine the accuracy by the classifier predication function. Even though the filter approach is faster than the wrapper it suffers from lack of information between the features and the classifier This approach may select irrelevant or redundant features because of the limitation of the evaluation function [17].

RELATED WORK
PRELIMINARIES
A MUTUAL INFORMATION BASED UNCERTAINTY MEASURE
Feature selection algorithm
2: Output
Proposed Method
Dataset Description
Numeric Results And Comparative Studies
CONCLUSION
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call