Abstract

Cancer remains a significant health concern due to its high mortality rates. Recent decades have witnessed substantial progress in cancer research, driven by advancements in high throughput sequencing technology and the application of diverse machine learning (ML) methods, particularly in the analysis of gene expression data. However, the proliferation of high-dimensional datasets, such as RNA-seq data, underscores the need for more robust ML techniques capable of efficiently handling large volumes of data to enable accurate treatment decisions. This paper introduces a novel hybrid feature selection (FS) algorithm, termed Game kernel SHapley Additive exPlanations (kSHAP), which combines with binary Social Ski Driver (bSSD), Adaptive Beta Hill Climbing (ABHC) and Late Acceptance Hill Climbing (LAHC) algorithms. The study comprehensively investigates three novel FS algorithms—kSHAP-bSSD, kSHAP-ABHC, and kSHAP-LAHC for cancer classification tasks using RNA sequencing (RNA-seq) datasets. An experiment conducted on five well-established RNA-seq cancer datasets: Lung Adenocarcinoma (LUAD), Stomach Adenocarcinoma (STAD), Breast Invasive Carcinoma (BRCA), lung squamous cell carcinoma (LUSC) and uterine corpus endometrial carcinoma (UCEC). The objective is to enhance cancer classification accuracy, robustness, and scalability using RNA-seq datasets. Additionally, the study evaluates six classifiers—Autoencoder (AE), Support Vector Machine (SVM), K-Nearest Neighbors (KNN), Naive Bayes (NB), Neural Network (NN), and Random Forest (RF) with AE consistently out performing others. Evaluation metrics include accuracy, recall, precision, box plot, F1-score, radar plot, confusion matrix, ROC and statistical analysis. Our approach is compared against recent state-of-the-art FS algorithms, showing improvements in gene selection and classification accuracy. The kSHAP-bSSD demonstrates superior performance across all datasets compared to traditional methods, achieving an accuracy rate of 99.9% in LUAD and exhibiting higher accuracy rates and robustness in STAD, BRCA, LUSC, and UCEC datasets. Assessment across multiple metrics affirms the superiority of kSHAP-bSSD and kSHAP-ABHC combinations, underscoring their effectiveness in cancer classification tasks.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.