Abstract
There is a lot of abnormal information in the development of lung cancer, and how to extract useful knowledge is urgent from massive information. Data mining technology has become a popular tool for medical classification and prediction. However, each technology has its advantage and disadvantage, and several data mining methods will be applied to conduct the in-depth analysis step by step. And the prediction results of different models are compared. A total of 180 lung cancer patients and 243 lung benign individuals were collected from the First Affiliated Hospital of Zhengzhou University from October 2014 to March 2016, and the prediction models based on epidemiological data, clinical features and tumor markers were developed by artificial neural network (ANN), decision tree C5.0 and support vector machine (SVM). The results showed that there were significant differences between the lung cancer group and the lung benign group in terms of seven tumor markers and 10 epidemiological and clinical indicators. The accuracy rates of ANN, C5.0 and SVM were 76.47, 89.92 and 85.71%, respectively. The results of receiver operating characteristic curve (ROC) curve revealed the area under the ROC curve (AUC) of ANN was 0.811 (0.770-0.847), the AUC of C5.0 was 0.897 (0.864-0.924) and the AUC of SVM was 0.878 (0.843-0.908). It was shown that the decision tree C5.0 model has the least error rate and highest accuracy, and it could be used to diagnose lung cancer.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
More From: European journal of cancer prevention : the official journal of the European Cancer Prevention Organisation (ECP)
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.