Abstract

High-throughput sequencing produces an extraordinary amount of genomic data that is organized into a number of high-dimension datasets. Accordingly, visualization of genomic data has become essential for quality control, exploration, and data interpretation. The Variant Call Format (VCF) is a text file format generated during the variant calling process that contains genomic information and locations of variants in a group of sequenced samples. The current workflow for visualization of genomic variant data from VCF files requires use of a combination of existing tools. Here, we describe VIVA (VIsualization of VAriants), a command line utility and Jupyter Notebook based tool for evaluating and sharing genomic data for variant analysis and quality control of sequencing experiments from VCF files. VIVA combines the functionality of existing tools into a single command to interactively evaluate and share genomic data, as well as create publication quality graphics.

Highlights

  • Generation sequencing produces an enormous amount of genomic data

  • BrowseVCF and Variant Call Format (VCF)-Miner have many of the same variant filtering features as VIVA; they have no options for visualization

  • One of our goals while building VIVA was to optimize the efficiency of reading and filtering large VCF files without the need for VCF file preprocessing

Read more

Summary

Introduction

Generation sequencing produces an enormous amount of genomic data. This genomic data is stored in standardized data structures that have been designed to facilitate efficient analysis. We introduce “VIVA”, a command line utility and Jupyter Notebook[2] based tool for evaluating and sharing genomic data for variant analysis and quality control of sequencing experiments from VCF files. To facilitate memory efficient data retrieval, existing VCF file parsing and visualization tools require users to preprocess their VCF files. This entails compressing and sorting VCF files by genomic position before either subsetting the file with an external program, such as VCFtools[1], or indexing the files with Tabix[3]. We have prepared a detailed table of features comparing VIVA with some of the tools in the current workflow (VCFTools[1], GEMINI4, BrowseVCF5, VCF.Filter[6], VCF-Miner[7], VCF-Server[8], vcfR9, IGV10) (Table 1). Produces interactive HTML5 based visualizations, supports grouping of samples by like metadata traits, and displays multiple genomic regions and genotypic-phenotypic associations in a single plot

Methods
Results
Discussion
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call