A resource of variant effect predictions of single nucleotide variants in model organisms.

Omar Wagih,Danish Memon,Pedro Beltrao,Marco Galardini,Bede P Busby,Athanasios Typas

doi:10.15252/msb.20188430

Abstract

The effect of single nucleotide variants (SNVs) in coding and noncoding regions is of great interest in genetics. Although many computational methods aim to elucidate the effects of SNVs on cellular mechanisms, it is not straightforward to comprehensively cover different molecular effects. To address this, we compiled and benchmarked sequence and structure‐based variant effect predictors and we computed the impact of nearly all possible amino acid and nucleotide variants in the reference genomes of Homo sapiens, Saccharomyces cerevisiae and Escherichia coli. Studied mechanisms include protein stability, interaction interfaces, post‐translational modifications and transcription factor binding sites. We apply this resource to the study of natural and disease coding variants. We also show how variant effects can be aggregated to generate protein complex burden scores that uncover protein complex to phenotype associations based on a set of newly generated growth profiles of 93 sequenced S. cerevisiae strains in 43 conditions. This resource is available through mutfunc (www.mutfunc.com), a tool by which users can query precomputed predictions by providing amino acid or nucleotide‐level variants.

Highlights

One of the key challenges of biology is to understand how genetic variation drive changes in phenotypes
Genome-wide association studies (GWASs) are typically limited in their ability to explain the underlying mechanism that is influenced by the variant in question
Functional genomic regions display evolutionary constraint across yeast and human individuals In order to set up the variant effect prediction approaches we first derived, for E. coli, S. cerevisiae and H. sapiens, molecular information such as: experimental and homology based protein structural models for individual proteins and protein interfaces, transcription factor (TF) binding sites, protein kinase targets sites, post-translational modification sites and linear motif regions (Methods)

Summary

Introduction

One of the key challenges of biology is to understand how genetic variation drive changes in phenotypes. GWASs are typically limited in their ability to explain the underlying mechanism that is influenced by the variant in question. This missing mechanistic layer severely limits our understanding of how variants cause phenotypic variability. Coding variants can affect post-translational modification (PTM) sites (Reimand et al, 2015; Wagih et al, 2015), protein folding and stability (Lorch et al, 2000), protein interaction interfaces (Engin et al, 2016), sub-cellular localization (Björses et al, 2000), and introduce premature stop codons.

Methods

Results

Conclusion