Abstract

Despite the growing constellation of genetic loci linked to common traits, these loci have yet to account for most heritable variation, and most act through poorly understood mechanisms. Recent machine learning (ML) systems have used hierarchical biological knowledge to associate genetic mutations with phenotypic outcomes, yielding substantial predictive power and mechanistic insight. Here, we use an ontology-guided ML system to map single nucleotide variants (SNVs) focusing on 6 classic phenotypic traits in natural yeast populations. The 29 identified loci are largely novel and account for ~17% of the phenotypic variance, versus <3% for standard genetic analysis. Representative results show that sensitivity to hydroxyurea is linked to SNVs in two alternative purine biosynthesis pathways, and that sensitivity to copper arises through failure to detoxify reactive oxygen species in fatty acid metabolism. This work demonstrates a knowledge-based approach to amplifying and interpreting signals in population genetic studies.

Highlights

  • In recent decades, genome-wide association studies (GWAS) in humans have identified almost 19,000 associations between genetic loci and phenotypic traits [1]

  • We find that sensitivity to hydroxyurea is tied to genetic variants in two alternative purine biosynthesis pathways, and that sensitivity to copper arises through failure to detoxify reactive oxygen species in fatty acid metabolism

  • We explored mapping of causal variants from GWAS using genotype-phenotype data previously gathered in approximately 1000 natural S. cerevisiae isolates [29]

Read more

Summary

Introduction

Genome-wide association studies (GWAS) in humans have identified almost 19,000 associations between genetic loci and phenotypic traits [1]. Of the various explanations put forward for this phenomenon, a frequently discussed possibility is that complex disease genetics are driven by large numbers of alleles, each with small effect sizes, making them hard to detect through genome-wide association [3] To address this challenge, more complex models such as polygenic risk scores (PRS) have been developed, which sum effects across many variants to predict phenotype [4,5,6]. As many of the variants identified by GWAS are located in noncoding regions, follow-up experiments typically entail reporter assays, validations of transcription factor binding sides, animal models and genome engineering [11,12,13] Even these techniques do not begin to address functional effects of the variant beyond the gene, such as impacts on the states of proteins, protein complexes, metabolic processes and signaling pathways, and composition of cell types. The process of translating an associated locus to a causal single nucleotide variant (SNV) and to a causal gene and subsequent underlying biological mechanism is still far from routine

Methods
Results
Discussion
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call