1554 Background: Microsatellite instability (MSI) occurs via defects in mismatch repair (MMR) and is characterized by high tumor mutational burden (TMB) and a robust inflammatory response. Accurate MSI-high classification of immune-evasive tumors is vital due to the effectiveness of immune checkpoint inhibitor (ICI) therapy, such as pembrolizumab, which is FDA approved for all MSI-high tumors irrespective of anatomic origin. Detection methods of MSI-status vary widely, including immunohistochemistry (IHC), polymerase chain reaction (PCR), and next-generation sequencing (NGS). We developed a machine learning (ML) model to predict MSI status in patients with solid tumors using comprehensive genomic and immune profiling (CGIP), independent of direct sequencing data from microsatellite sites. Methods: We analyzed samples from 1,838 patients with colorectal cancer (CRC) by CGIP, which included DNA panel testing (523 genes) for pathogenic single nucleotide variants (pSNV) and determination of TMB (mut/Mb), and RNA sequencing (397 genes) for gene expression (GEX). The Boruta algorithm was used to select key genomic and GEX changes associated with MSI status. A distributed gradient boosting algorithm created a predictive model that was trained on 70% of the cohort. The model was tested on the remaining CRC cohort (Test), and in the PanCancer Atlas CRCs (TCGA), endometrial adenocarcinomas (EMCA), and CRC with indeterminate MSI testing by CGIP. Performance of the trained model was assessed using sensitivity, specificity, and positive (PPV) and negative (NPV) predictive value. Results: Feature selection identified 79 genes with pSNVs, 63 GEX changes, and TMB as informative for MSI prediction. The model showed high predictive accuracy in differentiating MSI-H tumors. Of the 39 cases that failed MSI component of CGIP (Indeterminate), 17 had MMR IHC results available, of which 1 demonstrated loss of MLH1/PMS2 and the other 16 showed intact (normal) expression. Testing of this cohort identified 3 cases as MSI-high, at least one of which was confirmed by IHC. No cases with intact MMR protein IHC were identified as MSI-high by the algorithm. Conclusions: Our ML-driven approach accurately assessed MSI status of CRC and EMCA using CGIP data. This approach also identified potential cases with MSI-high status where direct sequencing of microsatellites failed. This study highlights a method to identify patients with potential MSI-high status for orthogonal screening with the MSI component of a test fails. [Table: see text]
Read full abstract