The early and accurate diagnosis of lung cancer, a significant contributor to global cancer-related mortality, remains a paramount challenge in healthcare. Conventional diagnostic methods often lack the sensitivity required for early-stage detection, prompting the exploration of non-invasive alternatives. Leveraging advancements in genomics and bioinformatics, this study investigates the potential of deoxyribonucleic acid (DNA) and ribonucleic acid (RNA) analysis for early-stage lung cancer diagnosis. Two distinct datasets are utilized: GSE4115, comprising gene expression data from bronchial airway epithelial cells of smokers, and GSE33356, focusing on genomic alterations in Taiwanese female non-smoking lung cancer patients. The research employs comprehensive data pre-processing and feature reduction techniques, including normalization and Kernel Principal Component Analysis (KPCA). Subsequently, an ensemble of diverse learners, including Random Forests, AdaBoost, Bagging, Support Vector Machines (SVMs), and Neural Networks, is trained on the original datasets. A novel ensemble stacking approach is proposed, wherein initial predictions from the base learners are combined through logistic regression - the meta-learner to enhance predictive performance. The study aims to contribute to advancements in lung cancer detection by providing a more precise and non-invasive diagnostic method. By integrating DNA and RNA analysis with ensemble learning techniques, the research endeavours to enhance medical outcomes and potentially save lives through early-stage lung cancer detection.
Read full abstract