Abstract

For a deeper and richer analytic processing of medical datasets, feature selection aims to eliminate redundant and irrelevant features from the data. While filter has been touted as one of the simplest methods for feature selection, its applications have generally failed to identify and deal with embedded similarities among features. In this research, a hybrid approach for feature selection based on combining the filter method with the hierarchical agglomerative clustering method is proposed to eliminate irrelevant and redundant features in four medical datasets. A formal evaluation of the proposed approach unveils major improvements in the classification accuracy when results are compared to those obtained via only the applications of the filter methods and/or more classical-based feature selection approaches.

Highlights

  • In vying for a deeper and richer analytic processing of medical datasets, a key challenge in building a superior classification model via machine learning (ML) is the identification of a set of representative features that are inherently embedded in cumulative health datasets

  • Past research has investigated the applications of various feature selection methods that are of growing interests to the medical data analytics research community (Polat & Güneş, 2009; Akay, 2009; Shilaskar & Ghatol, 2013; Lavanya & Rani, 2011; Anbarasi, Anupriya & Iyengar, 2010; Inbarani, Azar & Jothi, 2014; Kumar, Ramachandra & Nagamani, 2014; Ibrahim, Ojo & Oluwafisoye, 2018)

  • While similarity is an amount that reflects the strength of relationship between two data items, dissimilarity deals with the measurement of divergence between two data items (Irani, Pise & Phatak, 2016). Based on these two methods, Filter methods and hierarchical agglomerative clustering algorithm (HAC) algorithm, we proposed an approach for feature selection

Read more

Summary

Introduction

In vying for a deeper and richer analytic processing of medical datasets, a key challenge in building a superior classification model via machine learning (ML) is the identification of a set of representative features that are inherently embedded in cumulative health datasets. This representative set of features should contain mostly relevant and non-redundant features so as to achieve improved accuracy and better classification results for data modeling. More recent filter-based feature selection approach such as the mRMR (minimum Redundancy Max Relevancy) has been designed for improved feature selection of microarray data. Other prominent feature selection approaches include a Fast Correlation Based Filter (FCBF) solution, FAST and other feature selection methods that used Genetic Algorithms (GAs), including Genetic Programming (GP) and Particle Swarm Optimization (PSO) approaches

Objectives
Methods
Results
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.