Abstract

In the realm of cancer research, machine learning algorithms have emerged as robust tools for analyzing DNA sequences, a critical aspect for early detection and risk assessment. Despite notable advancements in this domain, there exists a persistent demand for a predictive model that demonstrates high accuracy in estimating cancer risk. This study endeavors to address this exigency by employing an array of classification algorithms, including Logistic Regression, Gradient Boosting, Gaussian Naive Bayes, and a Blending method that amalgamates Logistic Regression and Gaussian Naive Bayes. These algorithms are fine-tuned with hyperparameters through Grid search techniques to predict cancer occurrences within a cohort of 390 individuals with characterized DNA sequences.The Blending method exhibits superior predictive performance in discerning five specific types of cancer: BRCA1 (Breast Cancer gene 1), KIRC-2 (Kidney Renal Clear Cell Carcinoma), COAD-3 (Colorectal Adenocarcinoma), LUAD-4 (Lung Adenocarcinoma), and PRAD-5 (Prostate Adenocarcinoma), achieving accuracy rates ranging from 96% to 100%. Notably, it significantly surpasses individual algorithms in predicting LUAD-4 and PRAD-5, with the Blending technique (incorporating Logistic Regression and Gaussian Naive Bayes) attaining an accuracy of 98%. The magnitude of this enhancement is manifest in the Micro-average and Macro-average ROC curves, which ascend to 99%. These findings underscore the potential of the Blending method as a valuable asset in cancer research, presenting promising prospects for enhanced accuracy and efficacy in cancer prediction endeavors.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call