Abstract

Healthcare is a rapidly growing industry in both developed and developing countries. The expanse of technology has facilitated the storage and analysis of the diverse data which the healthcare industry generates. Data mining algorithms have been employed in the health care industry for the past few years for diverse kind of decision making and predictive analysis related tasks. Classification algorithms have been widely used for early detection of disease symptoms among patients. However, the selection of a suitable classifier for a particular dataset is an important problem in various healthcare related problems. This paper puts forward an empirical comparison of five important classifiers built using decision trees, bayesian learning, support vector machines and ensemble learning on twelve UCI healthcare datasets. The experimental results are examined from multiple perspectives, namely accuracy, precision, recall and F-measure.

Highlights

  • The amount of data generated in today's world is humongous

  • The column 1 of the table provides the names of the datasets, the five classification algorithms are specified for each dataset in column 2 and columns 3, 4, 5 and 6 provide the values of performance measures – Accuracy, Recall, Precision and F1 Score obtained on applying the given algorithms on the specified dataset

  • Gradient Boosting came in front for all the measures followed by Random Forest, Support Vector Machines (SVM) took up the 3rd place for all the metrics, followed by Conditional Inference Trees and Naïve Bayes respectively

Read more

Summary

Introduction

The amount of data generated in today's world is humongous. It is not difficult to find databases with terabytes of data in enterprises and research facilities. Deloitte Touche Tohmatsu India has predicted that with increased digital adoption, the Indian healthcare market is likely to grow at a CAGR of 23 per cent. The per capita spending is only USD 40 in India in 2010, which is way below developed nations like USA and UK where the per capita spending is USD 7,285 and USD 3,867 respectively. It is a good deal more downcast than the worldwide per capita expenditure of USD 802. Written procedures and guidelines for data classification should define what categories and criteria the organization will use to classify data. The performance of the classifier is evaluated using several criteria such as accuracy, precision, recall etc

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call