Abstract

Classification analysis is widely used in enhancing the quality of healthcare applications by analysing data and discovering hidden patterns and relationships between the features, which can be used to support medical diagnostic decisions and improving the quality of patient care. Usually, a healthcare dataset may contain irrelevant, redundant, and noisy features; applying classification algorithms to such type of data may produce a less accurate and a less understandable results. Therefore, selection of optimal features has a significant influence on enhancing the accuracy of classification systems. Feature selection method is an effective data pre-processing technique in data mining, which can be used to identify a minimum set of features. This type of technique has immediate effects on speeding up classification algorithms and improving performance such as predictive accuracy. This paper, aims to evaluate the performance of five different classification methods including: C5.0, Rpart, k-nearest neighbor (KNN), Support Vector Machines (SVM), and Random Forest (RF), with three different feature selection methods, including: correlation-based feature selection method, Variables Importance selection method, and Recursive Feature elimination selection method on seven relevant numerical and mixed healthcare datasets. Ten-fold cross validation is used to evaluate the classification performance. The experiments showed that there is a variation of the effect of feature selection methods on the performance of classification techniques.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.