Abstract

Data mining is the process of extracting informative and useful rules or relations, that can be used to make predictions about the values of new instances, from existing data. A wide range of commercial and open source software programs are used for data mining. In this study, a comparison of several classification algorithms included in some open source softwares such as WEKA, Tanagra and Scikit-learn using SEER (Survillance Epidemiology and End Results) data set which consists of 60948 instances is performed. Key words: Data mining, classification analysis, open source data mining tools. &nbsp

Highlights

  • A wide range of algorithms can be used to extract information from data for data mining purposes

  • Weka is an open source data mining tool developed at Waikato University

  • Error rate 12.74 16.48 19.49 13.38 12.25 17.95 18.55 achieved with Weka is obtained with KStar with an accuracy of 85.44%, followed by J48 algorithm with an accuracy of 84.23%

Read more

Summary

Introduction

A wide range of algorithms can be used to extract information from data for data mining purposes. There are many different algorithms used for data mining. Because of its application on a wide range of different areas, studies in data mining technology are going on continously, new methods are being developed and enhancements to the existing ones are taking place continously. A comparison of several classification algorithms in data mining on health data has been conducted.

Results
Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.