Microsatellite instability (MSI) is a hypermutator phenotype caused by DNA mismatch repair deficiency. MSI has been reported in various human cancers, particularly colorectal, gastric and endometrial cancers. MSI is a promising biomarker for cancer prognosis and immune checkpoint blockade immunotherapy. Several computational methods have been developed for MSI detection using DNA- or RNA-based approaches based on next-generation sequencing. Epigenetic mechanisms, such as DNA methylation, regulate gene expression and play critical roles in the development and progression of cancer. We here developed MSI-XGNN, a new computational framework for predicting MSI status using bulk RNA-sequencing and DNA methylation data. MSI-XGNN is an explainable deep learning model that combines a graph neural network (GNN) model to extract features from the gene-methylation probe network with a CatBoost model to classify MSI status. MSI-XGNN, which requires tumor-only samples, exhibited comparable performance with two well-known methods that require tumor-normal paired sequencing data, MSIsensor and MANTIS and better performance than several other tools. MSI-XGNN also showed good generalizability on independent validation datasets. MSI-XGNN identified six MSI markers consisting of four methylation probes (EPM2AIP1|MLH1:cg14598950, EPM2AIP1|MLH1:cg27331401, LNP1:cg05428436 and TSC22D2:cg15048832) and two genes (RPL22L1 and MSH4) constituting the optimal feature subset. All six markers were significantly associated with beneficial tumor microenvironment characteristics for immunotherapy, such as tumor mutation burden, neoantigens and immune checkpoint molecules such as programmed cell death-1 and cytotoxic T-lymphocyte antigen-4. Overall, our study provides a powerful and explainable deep learning model for predicting MSI status and identifying MSI markers that can potentially be used for clinical MSI evaluation.
Read full abstract