Abstract

Predicting the functional consequences of genetic variants in non-coding regions is a challenging problem. We propose here a semi-supervised approach, GenoNet, to jointly utilize experimentally confirmed regulatory variants (labeled variants), millions of unlabeled variants genome-wide, and more than a thousand cell/tissue type specific epigenetic annotations to predict functional consequences of non-coding variants. Through the application to several experimental datasets, we demonstrate that the proposed method significantly improves prediction accuracy compared to existing functional prediction methods at the tissue/cell type level, but especially so at the organism level. Importantly, we illustrate how the GenoNet scores can help in fine-mapping at GWAS loci, and in the discovery of disease associated genes in sequencing studies. As more comprehensive lists of experimentally validated variants become available over the next few years, semi-supervised methods like GenoNet can be used to provide increasingly accurate functional predictions for variants genome-wide and across a variety of cell/tissue types.

Highlights

  • Predicting the functional consequences of genetic variants in non-coding regions is a challenging problem

  • We propose a semi-supervised regularization algorithm, referred to as GenoNet, for functional prediction at cell type/tissue level using labeled data from MPRA experiments and genome-wide functional annotations in 127 different cell types and tissues from the Encyclopedia of DNA Elements (ENCODE) and Roadmap Epigenomics projects

  • We adopt Elastic-net because of its superior performance when the features are correlated and have sparse non-zero coefficients[21]. We further justify this choice of supervised algorithm for GenoNet the labels via for numerical m variants simulations with MPRA

Read more

Summary

Introduction

Predicting the functional consequences of genetic variants in non-coding regions is a challenging problem. Genome-wide association studies (GWAS) have identified a large number of non-coding variants that are likely to be involved in both genetic and epigenetic gene regulation in a highly context-specific manner[2] Accurately predicting both organism level and cell type/tissue-specific functional consequences of non-coding variation is of great interest. Recent developments in highthroughput assays to assess the functional impact of variants in regulatory regions (e.g. MPRAs, CRISPR/Cas9-mediated in situ saturating mutagenesis) can lead to the generation of high-quality data on the functional effects of genetic variants in various contexts These experimental approaches are currently quite laborious and difficult to implement, data on even modest number of variants from such experiments can be used to train (semi-)supervised approaches for improved prediction accuracy. These applications clearly show the importance of tissue/cell type-specific scores in gene discovery for complex traits

Methods
Results
Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.