Abstract

In recent years, application of data mining methods in health industry has received increased attention from both health professionals and scholars. This paper presents a data mining framework for detecting breast cancer based on real data from one of the Iran hospitals by applying association rules and the most commonly used classifiers. The former were adopted for reducing the size of datasets, while the latter were chosen for cancer prediction. A k-fold cross-validation procedure was included for evaluating the performance of the proposed classifiers. Among the six classifiers used in this paper, support vector machine achieved the best results, with an accuracy of 93%. It is worth mentioning that the approach proposed can be applied for detecting other diseases as well.

Highlights

  • Data mining or knowledge discovery in databases can be defined as the process for discovering the hidden patterns from amounts of data

  • Large amounts of complex and vast data about disease diagnosis records are generated by health centres which are difficult to analyse, that analysis is in demand for retrieving important information that may help health professionals in their future decision making

  • The feature selection procedure proposed in this study is based on association rules, which are considered a viable data mining approach for selecting the most significant features among those that best characterize the presence of breast cancer

Read more

Summary

Introduction

Data mining or knowledge discovery in databases can be defined as the process for discovering the hidden patterns from amounts of data. Data mining can be applied in various industries such as healthcare (Brahami, Atmani et al 2013), risk detection (Koyuncugil and Ozgulbas 2012) and fraud detection (Akhilomen 2013). Data mining has received much attention of health scholars and professionals. Some benefits of data mining in health industries include providing fast and practical solutions to the patients at a lower cost, detection of illness caused and recommending the medical treatment methods, and building drug recommendation systems. Large amounts of complex and vast data about disease diagnosis records are generated by health centres which are difficult to analyse, that analysis is in demand for retrieving important information that may help health professionals in their future decision making. Data mining can be introduced as a way for overcoming these difficulties through three common methods: classification, clustering and association rules

Methods
Findings
Discussion
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.