Abstract

BackgroundData from a plethora of high-throughput sequencing studies is readily available to researchers, providing genetic variants detected in a variety of healthy and disease populations. While each individual cohort helps gain insights into polymorphic and disease-associated variants, a joint perspective can be more powerful in identifying polymorphisms, rare variants, disease-associations, genetic burden, somatic variants, and disease mechanisms.DescriptionWe have set up a Reference Variant Store (RVS) containing variants observed in a number of large-scale sequencing efforts, such as 1000 Genomes, ExAC, Scripps Wellderly, UK10K; various genotyping studies; and disease association databases. RVS holds extensive annotations pertaining to affected genes, functional impacts, disease associations, and population frequencies. RVS currently stores 400 million distinct variants observed in more than 80,000 human samples.ConclusionsRVS facilitates cross-study analysis to discover novel genetic risk factors, gene–disease associations, potential disease mechanisms, and actionable variants. Due to its large reference populations, RVS can also be employed for variant filtration and gene prioritization.AvailabilityA web interface to public datasets and annotations in RVS is available at https://rvs.u.hpc.mssm.edu/.Electronic supplementary materialThe online version of this article (doi:10.1186/s12859-015-0865-9) contains supplementary material, which is available to authorized users.

Highlights

  • Data from a plethora of high-throughput sequencing studies is readily available to researchers, providing genetic variants detected in a variety of healthy and disease populations

  • Due to its large reference populations, Reference Variant Store (RVS) can be employed for variant filtration and gene prioritization

  • While the first is primarily aimed at decreasing the operations needed to fully annotate new studies, the second provides a fundamental basis for analyses of disease populations, surpassing the capabilities of each individual study to function as a reference population

Read more

Summary

Introduction

Data from a plethora of high-throughput sequencing studies is readily available to researchers, providing genetic variants detected in a variety of healthy and disease populations. As high-throughput sequencing technologies become more widely employed, variants detected in large resequencing studies are continuously being published, including the 1000 Genomes Project, ESP6500, ExAC, and TCGA [1,2,3,4] These variants differ from the ones targeted by genotyping arrays, in that most of them will initially not be properly annotated with genes, amino acid changes, impacts, associated diseases, or population frequencies. Our major goal is to build an infrastructure that allows centralized storage of every variant observed in resequencing studies, in-house projects, or known in curated databases In this centralized storage, variants will be annotated once using a spectrum of tools for functional impact and predictions, as well as population frequencies, diseases-associations, pharmacogenetic information, literature mining, and so on. The accumulated allele frequencies help to gain an understanding of the distribution of disease-associated variants in reference populations

Objectives
Findings
Discussion
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.