Abstract

The medical data are multidimensional, and are represented by a large number of features. Hundreds of independent features (parameters) in these high dimensional databases need to be simultaneously considered and analyzed, for valuable decision-making information in medical prediction. Most data mining methods depend on a set of features that define the behavior of the learning algorithm and directly or indirectly influence the complexity of the resulting models. Hence, to improve the efficiency and accuracy of mining task on high dimensional data, the data must be preprocessed by an efficient dimensionality reduction method. The aim of this study is to improve the diagnostic accuracy of diabetes disease by selecting informative features of Pima Indians Diabetes Dataset. This study proposes a Hybrid Prediction Model with F-score feature selection approach to identify the optimal feature subset of the Pima Indians Diabetes dataset. The features of diabetes dataset are ranked using F-score and the feature subset that gives the minimal clustering error is the optimal feature subset of the dataset. The correctly classified instances determine the pattern for diagnosis and are used for further classification process. The improved performance of the Support Vector Machine classifier measured in terms of Accuracy of the classifier, Sensitivity, Specificity and Area Under Curve (AUC) proves that the proposed feature approach indeed improves the performance of classification. The proposed prediction model achieves a predictive accuracy of 98.9427 and it is the highest predictive accuracy for diabetes dataset compared to other models in literature for this problem.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.