Abstract
Lung cancer is one of the three deadliest diseases in the world and has rapidly developed. Based on this, researchers conducted research to predict the factors that influence lung cancer. One method to identify this is using data mining methods and classification techniques. Researchers used several popular algorithms in classification to make comparisons of the most accurate algorithms for lung cancer classification. The algorithms used include K-Nearest Neighbor, Random Forest Classifier, Logistic Regression, Linear SVM, Naïve Bayes, Decision Tree, Random Forest, Gradient Boosting, Kernel SVM, and MLPClassifier. The researcher used this algorithm because, in the research that the researcher found on the Kaggle platform, he examined the comparison of the algorithm using the breast cancer dataset. In previous studies, their researchers used SVM, which obtained an accuracy of 96.47%, Neural Networks of 97.06%, and Naïve Bayes with an accuracy of 91.18% to study breast cancer. The difference from previous research is that this study uses several existing algorithms in Machine Learning such as K-Nearest Neighbor, Random Forest Classifier, Logistic Regression, Linear SVM, Naïve Bayes, Decision Tree, Random Forest, Gradient Boosting, Kernel SVM, and MLPClassifier. In addition, this research was conducted to see whether the results of the accuracy of the algorithm that the researchers carried out using the lung cancer dataset had different results. The results of this study found that the more accurate algorithms were Random Forest and Gradient Boosting with an accuracy value of 100%, whereas in previous studies, it was the same. Still, Gradient Boosting had a higher accuracy value than Random Forest. Then, based on the data used in this study, the most influencing factors in predicting a diagnosis of lung cancer are obesity and coughing up blood. The results of this study found that the more accurate algorithms were Random Forest and Gradient Boosting with an accuracy value of 100%, whereas in previous studies, it was the same. Still, Gradient Boosting had a higher accuracy value than Random Forest. Then, based on the data used in this study, the most influencing factors in predicting a diagnosis of lung cancer are obesity and coughing up blood. The results of this study found that the more accurate algorithms were Random Forest and Gradient Boosting with an accuracy value of 100%, whereas in previous studies, it was the same. Still, Gradient Boosting had a higher accuracy value than Random Forest. Then, based on the data used in this study, the most influencing factors in predicting a diagnosis of lung cancer are obesity and coughing up blood.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
More From: Jurnal Sistem Informasi dan Ilmu Komputer Prima(JUSIKOM PRIMA)
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.