Abstract

Hi-C exploits contact frequencies between pairs of loci to bridge and order contigs during genome assembly, resulting in chromosome-level assemblies. Because few robust programs are available for this type of data, we developed instaGRAAL, a complete overhaul of the GRAAL program, which has adapted the latter to allow efficient assembly of large genomes. instaGRAAL features a number of improvements over GRAAL, including a modular correction approach that optionally integrates independent data. We validate the program using data for two brown algae, and human, to generate near-complete assemblies with minimal human intervention.

Highlights

  • Continuous developments in DNA sequencing technologies aim at alleviating the technical challenges that limit the ability to assemble sequence data into full-length chromosomes [1,2,3]

  • The core principles of GRAAL and instaGRAAL are similar: both exploit a Markov Chain Monte Carlo (MCMC) approach to perform a series of permutations of genome fragments

  • The analysis indicated that many of the rearrangements found in the linkage groups (LG) v2 assembly were potentially errors and that both GRAAL and instaGRAAL were efficient at placing large regions where they belong in the genome, albeit less accurately for GRAAL and in the absence of correction

Read more

Summary

Introduction

Continuous developments in DNA sequencing technologies aim at alleviating the technical challenges that limit the ability to assemble sequence data into full-length chromosomes [1,2,3]. Conventional assembly programs and pipelines often encounter difficulties to close gaps in draft genome assemblies introduced by regions enriched in repeated elements. These assemblers efficiently generate overlapping sets of reads (i.e., contiguous sequences or contigs) but encounter difficulties linking these contigs together into scaffolds. The development of long-read sequencing technology and accompanying assembly programs has considerably alleviated these difficulties, but some gaps remain in genome scaffolds, notably at the level of long repeated/low-complexity DNA sequences.

Methods
Results
Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.