Abstract
Genetic variants that impact pre-mRNA splicing can result in aberrant protein production and disease. Most diagnostic bioinformatic pipelines easily identify splice-altering variants within canonical splice acceptor and donor sites. However, non-canonical splice variants are missed by most pipelines. To address this area of need we developed Introme, a bioinformatic tool designed to identify both canonical and non-canonical splice-altering variants. Introme uses machine learning to integrate predictions from multiple splice detection tools (SpliceAI, MMSplice, dbscSNV, Branchpointer, SPIDEX & ESEFinder), allele frequency and conservation to evaluate the likelihood of a splice-altering impact. We systematically curated 906 functionally validated splice-altering variants and 565 variants with no splicing impact from the literature. Eighty percent of these variants were used to optimise a machine learning classifier. The remaining 20% of variants were used to test performance. Introme outperformed all previous splice variant detection tools (area under the receiver operating characteristic curve (AUC): 0.96), including SpliceAI (AUC 0.93) and MMSplice (AUC 0.81). Using Introme, we were able to identify 15 non-canonical splice-altering variants in a cohort of genetically unresolved neuromuscular patients. This included one nonsense variant which had an additional previously unrecognised splicing impact that appears to have reduced the severity the presenting clinical phenotype. The discovery of these additional splice-altering variants has resulted in a newly confirmed genetic diagnosis in multiple patients. Introme has also identified 3606 ClinVar-reported variants of uncertain significance in neuromuscular disease genes that are likely to have a significant impact on splicing. Introme is a powerful new splice variant detection tool which promises to significantly enhance our ability to detect diagnostically relevant splice-altering variants in neuromuscular patients.
Published Version
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have