Abstract

Background and objectiveMedical data plays a decisive role in disease diagnosis. The classification accuracy of high-dimensional datasets is often diminished by several redundant and irrelevant features. In this context, feature selection becomes an indispensable process. Feature selection primarily intends to identify a feature subspace which retains the classification accuracy while reducing the high computational cost of learning model as well as eliminating noise. The suitability of an appropriate feature selection approach heavily depends upon the capability of that approach to match the problem framework and to discover the intrinsic patterns within the data. The prime objective of this paper is to develop an ensemble-filter based hybrid feature selection model for disease detection. MethodsIn this paper, a four-step hybrid ensemble feature selection algorithm has been introduced. Firstly, the dataset is partitioned using the cross-validation procedure. Secondly, in the filter step, various filter methods based on weighted scores were ensembled to generate a ranking of features, and thirdly sequential forward selection algorithm is utilized as a wrapper technique to obtain an optimal subset of features. Finally, the resulting optimal subset is processed for subsequent classification tasks. ResultsExperiments have been performed on twenty benchmark medical datasets with different dimensionalities. The performance of the proposed hybrid approach is compared with fourteen state-of-the-art feature selection algorithms on four benchmark classifiers namely, Naïve Bayes, Support Vector Machine with Radial Basis Function, Random Forest and k-Nearest Neighbor. The empirical results clearly demonstrate the superiority of the proposed hybrid methodology over the competing methods with respect to accuracy, sensitivity, specificity, f1-score, area under curve evaluation measures and number of selected features. The statistical analysis of the obtained results shows the outperformance and the competitiveness of the proposed hybrid method with respect to various state-of-the-art algorithms. ConclusionsThis study concludes that the proposed hybrid approach proves to be more effective and reliable feature selection technique in selecting highly discriminative features. The framework can be utilized as a promising tool by both clinicians and researchers in enhancing the classification performance of medical datasets.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call