Abstract

BackgroundClosing gaps in draft genomes is an important post processing step in genome assembly. It leads to more complete genomes, which benefits downstream genome analysis such as annotation and genotyping. Several tools have been developed for gap closing. However, these tools don’t fully utilize the information contained in the sequence data. For example, while it is known that many gaps are caused by genomic repeats, existing tools often ignore many sequence reads that originate from a repeat-related gap.ResultsWe compare GAPPadder with GapCloser, GapFiller and Sealer on one bacterial genome, human chromosome 14 and the human whole genome with paired-end and mate-paired reads with both short and long insert sizes. Empirical results show that GAPPadder can close more gaps than these existing tools. Besides closing gaps on draft genomes assembled only from short sequence reads, GAPPadder can also be used to close gaps for draft genomes assembled with long reads. We show GAPPadder can close gaps on the bed bug genome and the Asian sea bass genome that are assembled partially and fully with long reads respectively. We also show GAPPadder is efficient in both time and memory usage.ConclusionIn this paper, we propose a new approach called GAPPadder for gap closing. The main advantage of GAPPadder is that it uses more information in sequence data for gap closing. In particular, GAPPadder finds and uses reads that originate from repeat-related gaps. We show that these repeat-associated reads are useful for gap closing, even though they are ignored by all existing tools. Other main features of GAPPadder include utilizing the information in sequence reads with different insert sizes and performing two-stage local assembly of gap sequences. The results show that our method can close more gaps than several existing tools. The software tool, GAPPadder, is available for download at https://github.com/Reedwarbler/GAPPadder.

Highlights

  • Closing gaps in draft genomes is an important post processing step in genome assembly

  • We develop a new approach called GAPPadder for closing gaps on draft genomes

  • We choose the draft genome assembled by ALLPATH-LG

Read more

Summary

Methods

Insert size 2283 to 2803 Combined GAPPadder GapCloser. Three groups of data of insert sizes (180, about 2500 and about 35 kb) of human chromosome 14, and their combination are used for comparison. Results are given for reads with 180 bp insert size only, and reads with 2500 bp insert size only and combined reads (with 180 bp, 2500 bp and 35 kb insert sizes) Category Fully closed Reported Validated Extended. GAPPadder fully closes 14,925 out of reported 19,476 gaps and extends 37,802 out of reported 52,879 gaps. GapCloser fully closes 2737 (3299 are reported) gaps and extends 20 (2417 are reported) gaps partially extended respectively, and 6 and 1 out of the fully closed and partially extended gaps are validated by the the same way

Results
Discussion
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call