Abstract

The present study aimed to identify the feature genes associated with smoking in lung adenocarcinoma (LAC) samples and explore the underlying mechanism. Three gene expression datasets of LAC samples were downloaded from the Gene Expression Omnibus database through pre-set criteria and the expression data were processed using meta-analysis. Differentially expressed genes (DEGs) between LAC samples of smokers and non-smokers were identified using limma package in R. The classification accuracy of selected DEGs were visualized using hierarchical clustering analysis in R language. A protein-protein interaction (PPI) network was constructed using gene interaction data from the Human Protein Reference Database for the DEGs. Betweenness centrality was calculated for each node in the network and genes with the greatest BC values were utilized for the construction of the support vector machine (SVM) classifier. The dataset GSE43458 was used as the training dataset for the construction and the other datasets (GSE12667 and GSE10072) were used as the validation datasets. The classification accuracy of the classifier was tested using sensitivity, specificity, positive predictive value, negative predictive value and area under curve parameters with the pROC package in R language. The feature genes in the SVM classifier were subjected to pathway enrichment analysis using Fisher's exact test. A total of 347 genes were identified to be differentially expressed between samples of smokers and non-smokers. The PPI network of DEGs were comprised of 202 nodes and 300 edges. An SVM classifier comprised of 26 feature genes was constructed to distinguish between different LAC samples, with prediction accuracies for the GSE43458, GSE12667 and GSE10072 datasets of 100, 100 and 94.83%, respectively. Furthermore, the 26 feature genes that were significantly enriched in 9 overrepresented biological pathways, including extracellular matrix-receptor interaction, proteoglycans in cancer, cell adhesion molecules, p53 signaling pathway, microRNAs in cancer and apoptosis, were identified to be smoking-related genes in LAC. In conclusion, an SVM classifier with a high prediction accuracy for smoking and non-smoking samples was obtained. The genes in the classifier may likely be the potential feature genes associated with the development of patients with LAC who smoke.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.