Abstract

This software repository provides a pipeline for converting raw ClinVar data files into analysis-friendly tab-delimited tables, and also provides these tables for the most recent ClinVar release. Separate tables are generated for genome builds GRCh37 and GRCh38 as well as for mono-allelic variants and complex multi-allelic variants. Additionally, the tables are augmented with allele frequencies from the ExAC and gnomAD datasets as these are often consulted when analyzing ClinVar variants. Overall, this work provides ClinVar data in a format that is easier to work with and can be directly loaded into a variety of popular analysis tools such as R, python pandas, and SQL databases.

Highlights

  • ClinVar[1] is a public database hosted by the National Center for Biotechnology Information (NCBI) for the purpose of collecting information on genotype-phenotype relationships in the human genome

  • The TXT file is organized around Allele ID and the reported clinical significance is aggregated over distinct disorders

  • To facilitate access to accurate and comprehensive ClinVar data at scale, we developed this software tool to convert the latest raw ClinVar data into multiple data files with options specified by users

Read more

Summary

Introduction

ClinVar[1] is a public database hosted by the National Center for Biotechnology Information (NCBI) for the purpose of collecting information on genotype-phenotype relationships in the human genome. The XML file is a comprehensive representation, but is organized around unique variant-condition combinations and is large and complex, making it difficult to quickly look up a variant of interest, and many potential users considering larger scale analyses may not be familiar with tools required to parse this data format. Both the XML and TXT representations contain many genomic coordinates that have been parsed from HGVS notation. The resulting files provide a summary of the most relevant fields for a range of uses in an accessible format

Methods
Limitations
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call