BackgroundTumor mutation burden (TMB) has been considered a biomarker for utilization of immune checkpoint inhibitors(ICIs), but whole exome sequencing(WES) and cancer gene panel(CGP) based on next generation sequencing for TMB detection are costly. Here, we use transcriptome data of TCGA to construct a model for TMB prediction in gastrointestinal tumors.MethodsTranscriptome data, somatic mutation data and clinical data of four gastrointestinal tumors from TCGA, including esophageal cancer (ESCA), stomach adenocarcinoma (STAD), colon adenocarcinoma (COAD) and rectal adenocarcinoma (READ). Using R, we performed visual analysis of somatic mutation data, differentially expressed genes (DEGs) function enrichment analysis, gene set enrichment analysis (GSEA), and estimated TMB value in clinic. Finally, a deep neural network (DNN) model was constructed for TMB prediction.ResultsVisualization of somatic mutation data summarized the classification of mutation, frequency of each mutation type, and top-mutated genes. GSEA showed the enrichment of CD4+/CD8+ T cells in the high TMB group and the activation of tumor suppressing pathways. Single-sample GSEA (ssGSEA) manifested that the high-TMB group had higher level of multiple immune cells infiltration. In addition, distribution of TMB was related to clinical parameters.Like age, M stage, N stage, AJCC stage, and overall survival(OS). After model optimization using genetic algorithm, in the training set, validation set, and testing set, the Pearson relevance coefficient r between predicted values and actual values reaches 0.98, 0.82, and 0.92, respectively; the coefficient of determination R2 is 0.95, 0.82, and 0.7, respectively.ConclusionTMB correlates with clinicopathological parameters in gastrointestinal carcinoma, and patients with high TMB have higher levels of immune infiltration. In addition, the DNN model based on 31 genes predicts TMB of gastrointestinal tumors in a high accuracy.
Read full abstract