Abstract

BackgroundScaffolding is an essential step in the genome assembly process. Current methods based on large fragment paired-end reads or long reads allow an increase in contiguity but often lack consistency in repetitive regions, resulting in fragmented assemblies. Here, we describe a novel tool to link assemblies to a genome map to aid complex genome reconstruction by detecting assembly errors and allowing scaffold ordering and anchoring.ResultsWe present MaGuS (map-guided scaffolding), a modular tool that uses a draft genome assembly, a Whole Genome Profiling™ (WGP) map, and high-throughput paired-end sequencing data to estimate the quality and to enhance the contiguity of an assembly. We generated several assemblies of the Arabidopsis genome using different scaffolding programs and applied MaGuS to select the best assembly using quality metrics. Then, we used MaGuS to perform map-guided scaffolding to increase contiguity by creating new scaffold links in low-covered and highly repetitive regions where other commonly used scaffolding methods lack consistency.ConclusionsMaGuS is a powerful reference-free evaluator of assembly quality and a WGP map-guided scaffolder that is freely available at https://github.com/institut-de-genomique/MaGuS. Its use can be extended to other high-throughput sequencing data (e.g., long-read data) and also to other map data (e.g., genetic maps) to improve the quality and the contiguity of large and complex genome assemblies.Electronic supplementary materialThe online version of this article (doi:10.1186/s12859-016-0969-x) contains supplementary material, which is available to authorized users.

Highlights

  • IntroductionCurrent methods based on large fragment paired-end reads or long reads allow an increase in contiguity but often lack consistency in repetitive regions, resulting in fragmented assemblies

  • Scaffolding is an essential step in the genome assembly process

  • The method presented here and implemented in MaGuS enabled the evaluation of the quality and the scaffolding of a draft genome assembly using a physical map and highthroughput screening (HTS) data

Read more

Summary

Introduction

Current methods based on large fragment paired-end reads or long reads allow an increase in contiguity but often lack consistency in repetitive regions, resulting in fragmented assemblies. We describe a novel tool to link assemblies to a genome map to aid complex genome reconstruction by detecting assembly errors and allowing scaffold ordering and anchoring. The genome assembly process usually involves four main steps: reads assembly into contiguous sequences (contigs), linking of contigs into larger gap-containing sequences (scaffolds), gap closing to fill gaps generated by the scaffolding, and anchoring onto a genetic map to build the final pseudomolecules. End sequences of large fragments (>1 kb) or long reads are aligned to the Madoui et al BMC Bioinformatics (2016) 17:115 a contigs junction; as a consequence, low-covered contigs are overlooked for scaffold building. Several different types of genome maps can be produced to drive or improve assemblies including physical maps, optical maps, and genetic maps

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call