A Statistical Framework to Predict Functional Non-Coding Regions in the Human Genome Through Integrated Analysis of Annotation Data

Qiongshi Lu,Hongyu Zhao,Kei-Hoi Cheung,Yuwei Cheng,Yiming Hu,Jiehuan Sun

doi:10.1038/srep10576

Qiongshi Lu, Hongyu Zhao + Show 4 more

Open Access

https://doi.org/10.1038/srep10576

Copy DOI

Abstract

Identifying functional regions in the human genome is a major goal in human genetics. Great efforts have been made to functionally annotate the human genome either through computational predictions, such as genomic conservation, or high-throughput experiments, such as the ENCODE project. These efforts have resulted in a rich collection of functional annotation data of diverse types that need to be jointly analyzed for integrated interpretation and annotation. Here we present GenoCanyon, a whole-genome annotation method that performs unsupervised statistical learning using 22 computational and experimental annotations thereby inferring the functional potential of each position in the human genome. With GenoCanyon, we are able to predict many of the known functional regions. The ability of predicting functional regions as well as its generalizable statistical framework makes GenoCanyon a unique and powerful tool for whole-genome annotation. The GenoCanyon web server is available at http://genocanyon.med.yale.edu

Highlights

Annotating functional elements in the human genome is a major goal in human genetics
We present GenoCanyon, a whole-genome annotation tool based on unsupervised statistical learning
The prediction results in these regions showed that GenoCanyon is capable of detecting functional regions in the human genome, which is a unique feature most existing whole-genome annotation tools do not have

Summary

Introduction

Annotating functional elements in the human genome is a major goal in human genetics. High-throughput experiments, e.g. the ENCODE project[7], suggest that a large fraction of the human genome are functionally relevant All of this evidence suggests the importance and need for extending the annotation tools from the coding regions to the entire human genome. Prediction of deleteriousness does not cover every aspect of functional annotation The potential of these variant classifiers in understanding the genomic architecture on a large scale and in detecting regulatory elements such as cis-regulatory modules remains to be thoroughly investigated. As for choosing between a supervised approach, where some gold standard datasets are needed to train the model, and an unsupervised approach, where no labeled data are used, we focus on developing an unsupervised learning method in this article This is because current supervised-learning-based annotation tools suffer from highly biased training data, which is largely due to our limited knowledge of non-coding regions. Its flexible and generalizable statistical framework could benefit future applications

Methods

Results

Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Scientific Reports	Publication Date: May 27, 2015
Citations: 174	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

A Statistical Framework to Predict Functional Non-Coding Regions in the Human Genome Through Integrated Analysis of Annotation Data

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Scientific Reports

Lead the way for us

Similar Papers

CScape-somatic: distinguishing driver and passenger point mutations in the cancer genome.
Mark F Rogers ... Tom R Gaunt
Bioinformatics (Oxford, England) | VOL. 36
Mark F Rogers, et. al.Mark F Rogers ... Tom R Gaunt
13 Apr 2020
Bioinformatics (Oxford, England) | VOL. 36

Analysis of Complex Disease Association and Linkage Studies Using the University of California Santa Cruz Genome Browser
Tianyuan Wang ... Terrence S Furey
Circulation: Cardiovascular Genetics | VOL. 2
Tianyuan Wang, et. al.Tianyuan Wang ... Terrence S Furey
01 Apr 2009
Circulation: Cardiovascular Genetics | VOL. 2

A new measurement of sequence conservation
Xiaohui Cai ... Xiaoman Li
BMC Genomics | VOL. 10
Xiaohui Cai, et. al.Xiaohui Cai ... Xiaoman Li
01 Dec 2009
BMC Genomics | VOL. 10

MicroRNA Genes Derived from Repetitive Elements and Expanded by Segmental Duplication Events in Mammalian Genomes
Zhidong Yuan ... Jianming Xie
PLoS ONE | VOL. 6
Zhidong Yuan, et. al.Zhidong Yuan ... Jianming Xie
16 Mar 2011
PLoS ONE | VOL. 6

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

A Statistical Framework to Predict Functional Non-Coding Regions in the Human Genome Through Integrated Analysis of Annotation Data

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Scientific Reports