Abstract

Differentiation between phenotypically neutral and disease-causing genetic variation remains an open and relevant problem. Among different types of variation, non-frameshifting insertions and deletions (indels) represent an understudied group with widespread phenotypic consequences. To address this challenge, we present a machine learning method, MutPred-Indel, that predicts pathogenicity and identifies types of functional residues impacted by non-frameshifting insertion/deletion variation. The model shows good predictive performance as well as the ability to identify impacted structural and functional residues including secondary structure, intrinsic disorder, metal and macromolecular binding, post-translational modifications, allosteric sites, and catalytic residues. We identify structural and functional mechanisms impacted preferentially by germline variation from the Human Gene Mutation Database, recurrent somatic variation from COSMIC in the context of different cancers, as well as de novo variants from families with autism spectrum disorder. Further, the distributions of pathogenicity prediction scores generated by MutPred-Indel are shown to differentiate highly recurrent from non-recurrent somatic variation. Collectively, we present a framework to facilitate the interrogation of both pathogenicity and the functional effects of non-frameshifting insertion/deletion variants. The MutPred-Indel webserver is available at http://mutpred.mutdb.org/.

Highlights

  • Insertion and deletion events comprise a diverse category of genetic variation that result in a range of phenotypic and molecular effects [1, 2]

  • The Human Gene Mutation Database (HGMD) Professional variants underlying the results presented in the study are available from http://www.hgmd.cf.ac.uk

  • We develop MutPred-Indel, a machine learning method to predict the pathogenicity of non-frameshifting insertion/deletion variation and, in addition, highlight structural and functional mechanisms potentially impacted by a given variant

Read more

Summary

Introduction

Insertion and deletion events comprise a diverse category of genetic variation that result in a range of phenotypic and molecular effects [1, 2]. The dozens of sequence-retaining insertion, deletion and complex indel variants, referred to here collectively as non-frameshifting insertion/deletion variants or “indels”, are significantly less wellstudied than single nucleotide substitutions. Non-frameshifting insertion/deletion variants result in the gain or loss of a number of nucleotides divisible by three, such that the reading frame of the mRNA is not disrupted. The resultant mutant protein sequence differs from the wildtype with the addition and/or deletion of one or more amino acid residues. Three types of protein-coding insertion/deletion variants are discussed: insertions, deletions, and complex indel variants. The less abundant complex indel variants arise from events where both deletion and insertion events occur in tandem, and in this work comprise both deletioninsertion and complex substitution variants

Methods
Results
Discussion
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call