Abstract

Non-synonymous Single Nucleotide Variants (nsSNVs), resulting in single amino acid variants (SAVs), are important drivers of evolutionary adaptation across the tree of life. Humans carry on average over 10,000 SAVs per individual genome, many of which likely have little to no impact on the function of the protein they affect. Experimental evidence for protein function changes as a result of SAVs remain sparse – a situation that can be somewhat alleviated by predicting their impact using computational methods. Here, we used SNAP to examine both observed and in silico generated human variation in a set of 1,265 proteins that are consistently found across a number of diverse species. The number of SAVs that are predicted to have any functional effect on these proteins is smaller than expected, suggesting sequence/function optimization over evolutionary timescales. Additionally, we find that only a few of the yet-unobserved SAVs could drastically change the function of these proteins, while nearly a quarter would have only a mild functional effect. We observed that variants common in the human population localized to less conserved protein positions and carried mild to moderate functional effects more frequently than rare variants. As expected, rare variants carried severe effects more frequently than common variants. In line with current assumptions, we demonstrated that the change of the human reference sequence amino acid to the reference of another species (a cross-species variant) is unlikely to significantly impact protein function. However, we also observed that many cross-species variants may be weakly non-neutral for the purposes of quick adaptation to environmental changes, but may not be identified as such by current state-of-the-art methodology.

Highlights

  • The vast majority of human genomic variants are single nucleotide variants (SNVs) (Durbin, et al, 2010)

  • Single amino acid variant (SAV) effects were determined by SNAP (Bromberg and Rost, 2007), with negative scores identifying neutral single amino acid variants (SAVs) and positive scores identifying non-neutrals/effect SAVs; score absolute values indicate the reliability of prediction and, for non-neutral variants, the size of the effect (Bromberg, et al, 2013)

  • We investigated a set of single amino acid substitutions (SAVs) in evolutionarily persistent, likely ancient, proteins, i.e. those that we expect to be optimized to tolerate variation

Read more

Summary

Introduction

The vast majority of human genomic variants are single nucleotide variants (SNVs) (Durbin, et al, 2010). Coding region variants trivially make up a much smaller fraction of all variation than do noncoding variants (Lander, et al, 2001). The former affect protein structure/function and have a disproportionate effect of molecular function of the cellular machinery. Each individual genome contains approximately ten thousand of nsSNVs (non-synonymous SNVs, which change the amino acid sequence (Shen, et al, 2013), a combination of which is responsible for a variety of observed phenotypes, including disease (Peterson, et al, 2013; Hassan, et al, 2019).

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call