Abstract

SummaryReference genomes are refined to reflect error corrections and other improvements. While this process improves novel data generation and analysis, incorporating data analyzed on an older reference genome assembly requires transforming the coordinates and representations of the data to the new assembly. Multiple tools exist to perform this transformation for coordinate-only data types, but none supports accurate transformation of genome-wide short variation. Here we present GenomeWarp, a tool for efficiently transforming variants between genome assemblies. GenomeWarp transforms regions and short variants in a conservative manner to minimize false positive and negative variants in the target genome, and converts over 99% of regions and short variants from a representative human genome.Availability and implementationGenomeWarp is written in Java. All source code and the user manual are freely available at https://github.com/verilylifesciences/genomewarp.Supplementary information Supplementary data are available at Bioinformatics online.

Highlights

  • The Human Genome Project produced the first full draft of the human genome sequence (International Human Genome Sequencing Consortium, 2001)

  • Higher quality reference genome sequences improve the mapping and alignment of sequence read data, but present challenges for integrating data mapped to other genome assembly versions

  • The workflow of GenomeWarp is as follows (Supplementary Fig. S1): an input Genome VCF (gVCF) is modified into source variants and confidently called source regions

Read more

Summary

Introduction

The Human Genome Project produced the first full draft of the human genome sequence (International Human Genome Sequencing Consortium, 2001). The assembly of the human genome has been refined and updated multiple times (International Human Genome Sequencing Consortium, 2004). Higher quality reference genome sequences improve the mapping and alignment of sequence read data, but present challenges for integrating data mapped to other genome assembly versions. Many tools perform genomic region lift over, including UCSC LiftOver (Kuhn et al, 2013) and CrossMap (Zhao et al, 2014). These tools support lift over of multiple data formats, with

Methods
Findings
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.