Abstract

Following variant calling and annotation, accurate variant filtering is a crucial step to extract meaningful information from sequencing data and to investigate disease aetiology. However, the variant call format (VCF) used to store this information is not easy to handle for non-bioinformaticians. We present BrowseVCF, a flexible and intuitive software to enable researchers to browse and filter millions of variants in a few seconds. Key features include querying user-defined gene lists, grouping samples for family or tumour/normal studies and exporting results in spreadsheet format. BrowseVCF’s significant advantages over most existing tools include the ability to process data from any DNA sequencing experiment (exome, whole-genome and amplicons) and to correctly parse files annotated with Variant Effect Predictor. BrowseVCF can be used either locally on personal computers or as part of automated pipelines. Its user interface has been carefully designed to minimize tunable parameters. BrowseVCF is freely available from https://github.com/BSGOxford/BrowseVCF/releases/latest.

Highlights

  • Recent developments of Next-Generation Sequencing (NGS) technologies have led to a dramatic reduction in sequencing costs that, in turn, made DNA sequencing analyses accessible to small- and medium-sized laboratories

  • The variant call format (VCF) file format is very well-defined [1]. It consists of a header section, containing an arbitrary number of metainformation lines that start with the symbol ‘#’, and of a data section, containing one line per variant, split into eight mandatory columns: chromosome (CHROM), 1-based starting position of the variant (POS), unique identifier, if existing (ID), reference allele in the genome (REF), alternative allele(s) of the variant (ALT), Phred-scaled quality score (QUAL), flag for passed/failed control checks (FILTER) and variant-specific annotations (INFO), which can be an unrestricted number of either flags or key-value pairs

  • We tested the performance of BrowseVCF on two different file types: an exome trio and the wholegenome v2.18 of sample NA12878 generated by the ‘Genome in a Bottle Consortium’

Read more

Summary

Introduction

Recent developments of Next-Generation Sequencing (NGS) technologies have led to a dramatic reduction in sequencing costs that, in turn, made DNA sequencing analyses accessible to small- and medium-sized laboratories. The variant call format (VCF), originally developed for the 1000 Genomes Project, has become the standard for storing DNA variants together with rich annotations [1], and has led to the development of a variety of command-line analysis tools (for instance, VCFtools [1] or GEMINI [2]). Recent efforts have been made towards developing more user-friendly graphical software to increase accessibility to non-bioinformaticians; examples include the commercial suites Ingenuity Com/alamut-visual/), GoldenHelix SNP (http://goldenhelix.com/ SNP_Variation/index.html) and VariantStudio (http://www.illu mina.com/informatics/research/biological-data-interpretation/ variantstudio.html), as well as the open-source packages SNVerGUI [3], database.bio [4] and gNOME [5]. Researchers might be interested in keeping their personal or in-house

Methods
Results
Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.