Abstract

The big percentage of lung adenocarcinomas (LUAD) arising in lifetime nonsmokers and the low sensitivities of known major tobacco biomarkers urgent the identification of real molecular signatures for corresponding personalized treatment. Moreover, cancer is presumed to have a symptomatology strongly dependent on modules of functionally-related genes rather than on a unique important gene. Our aims, therefore, are to identify signature genes by optimizing the tobacco exposure pattern (TEP) classification model and to uncover their interaction relationships at different molecular levels. A new method, TTZ, is proposed to extract features as input variables to TEP classification model. Based on the Z-curve method, TTZ is able to extract features not only from mutation frequencies but also from sequencing information of insertions and deletions. Two independent LUAD datasets, The Cancer Genome Atlas (TCGA) and Broad data, are downloaded to train and test the TEP classification model. Thirty-four genes are identified as tobacco related mutational signature genes with the accuracies of 93.55% and 92.65% for train and validation data, respectively. The inference of genetic and protein-protein interaction (PPI) networks uncover that LAMA1, EGFR, KRAS and TNN are the most connected core genes. Six signature genes are proved significantly involved in the cilium damage pathway, which is considered as one of the root causes of lung cancer. The identified signature genes may serve as potential drug targets for the precision medicine of LUAD. Most importantly, the TTZ feature extracting method can be easily extended to other disease or cancer related mutational signature identification issues.

Highlights

  • Lung cancer has been the leading cause of cancer-related mortality throughout the world for decades [1]

  • Even though tobacco smoking is the major risk for lung cancer, there are still 10-15% of cancer patients of western world who have no history of tobacco exposure [2], [3]

  • The somatic variants of the whole exome sequencing (WXS) of The Cancer Genome Atlas (TCGA) (Legacy Genomic Data Commons, https://portal.gdc.cancer.gov/projects) data were measured with MuTect Variant Calling Pipeline

Read more

Summary

Introduction

Lung cancer has been the leading cause of cancer-related mortality throughout the world for decades [1]. Cigarette smokers are proved to be 15-30 times more likely to get lung cancer or die from it than lifetime nonsmokers. Even though tobacco smoking is the major risk for lung cancer, there are still 10-15% of cancer patients of western world who have no history of tobacco exposure [2], [3]. When considering therapies for LUAD patients, the carcinogenic mechanisms of smokers are believed to differ from those of nonsmokers [7]–[9]. Taken the two well-known major mutations frequently present in LUADs, KRAS and EGFR mutations, as examples: Riely et al [10] found that KRAS mutations in LUADs occurred at a frequency of only 25% in smokers but at a frequency of as high as 15% in nonsmoker; By contrast, in a large meta-analysis study, Ren et al [11]

Objectives
Methods
Results
Discussion
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call