Abstract

Motivation: For most research approaches, genome analyses are dependent on the existence of a high quality genome reference assembly. However, the local accuracy of an assembly remains difficult to assess and improve. The gEVAL browser allows the user to interrogate an assembly in any region of the genome by comparing it to different datasets and evaluating the concordance. These analyses include: a wide variety of sequence alignments, comparative analyses of multiple genome assemblies, and consistency with optical and other physical maps. gEVAL highlights allelic variations, regions of low complexity, abnormal coverage, and potential sequence and assembly errors, and offers strategies for improvement. Although gEVAL focuses primarily on sequence integrity, it can also display arbitrary annotation including from Ensembl or TrackHub sources. We provide gEVAL web sites for many human, mouse, zebrafish and chicken assemblies to support the Genome Reference Consortium, and gEVAL is also downloadable to enable its use for any organism and assembly.Availability and Implementation: Web Browser: http://geval.sanger.ac.uk, Plugin: http://wchow.github.io/wtsi-geval-plugin.Contact: kj2@sanger.ac.ukSupplementary information: Supplementary data are available at Bioinformatics online.

Highlights

  • Reference genomes are the foundation for genomic biology

  • In gEVAL, comparative analysis focuses on genomic alignments among the different assemblies available for the same species (Figure 1E)

  • This is useful in capturing sequence differences caused by both variation as well as misassembly, and aids the improvement of one assembly with components/guidance from another (Supplementary Figure1)

Read more

Summary

Introduction

Reference genomes are the foundation for genomic biology. As more and more de-novo sequencing projects are being conducted, and more draft genomes released, the continued challenge is to create sufficiently complete and correct assemblies that can be confidently used as references by the research community. Groups that tackle creating a genome reference may have at their disposal multiple alternative assemblies together with resources such as clone libraries, collections of short reads, cDNA sequences, RNAseq data, physical maps, optical maps, or genetic markers. These datasets help in repairing and reorganizing genomic regions or to create a strategy to infuse new sequence into a draft assembly. We introduce the gEVAL Browser project, a collection of software and frequently updated databases for key species that takes a tiling path as the backbone, conducts analyses using new sources of data and regularly releases the results in a web interface for users to evaluate sequence integrity and create strategies for sequence management

Overview
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call