Abstract
This software repository provides a pipeline for converting raw ClinVar data files into analysis-friendly tab-delimited tables, and also provides these tables for the most recent ClinVar release. Separate tables are generated for genome builds GRCh37 and GRCh38 as well as for mono-allelic variants and complex multi-allelic variants. Additionally, the tables are augmented with allele frequencies from the ExAC and gnomAD datasets as these are often consulted when analyzing ClinVar variants. Overall, this work provides ClinVar data in a format that is easier to work with and can be directly loaded into a variety of popular analysis tools such as R, python pandas, and SQL databases.
Highlights
ClinVar[1] is a public database hosted by the National Center for Biotechnology Information (NCBI) for the purpose of collecting information on genotype-phenotype relationships in the human genome
The TXT file is organized around Allele ID and the reported clinical significance is aggregated over distinct disorders
To facilitate access to accurate and comprehensive ClinVar data at scale, we developed this software tool to convert the latest raw ClinVar data into multiple data files with options specified by users
Summary
ClinVar[1] is a public database hosted by the National Center for Biotechnology Information (NCBI) for the purpose of collecting information on genotype-phenotype relationships in the human genome. The XML file is a comprehensive representation, but is organized around unique variant-condition combinations and is large and complex, making it difficult to quickly look up a variant of interest, and many potential users considering larger scale analyses may not be familiar with tools required to parse this data format. Both the XML and TXT representations contain many genomic coordinates that have been parsed from HGVS notation. The resulting files provide a summary of the most relevant fields for a range of uses in an accessible format
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have