Abstract

BackgroundAdvances in whole genome sequencing strategies have provided the opportunity for genomic and comparative genomic analysis of a vast variety of organisms. The analysis results are highly dependent on the quality of the genome assemblies used. Assessment of the assembly accuracy may significantly increase the reliability of the analysis results and is therefore of great importance.ResultsHere, we present a new tool called NucBreak aimed at localizing structural errors in assemblies, including insertions, deletions, duplications, inversions, and different inter- and intra-chromosomal rearrangements. The approach taken by existing alternative tools is based on analysing reads that do not map properly to the assembly, for instance discordantly mapped reads, soft-clipped reads and singletons. NucBreak uses an entirely different and unique method to localise the errors. It is based on analysing the alignments of reads that are properly mapped to an assembly and exploit information about the alternative read alignments. It does not annotate detected errors. We have compared NucBreak with other existing assembly accuracy assessment tools, namely Pilon, REAPR, and FRCbam as well as with several structural variant detection tools, including BreakDancer, Lumpy, and Wham, by using both simulated and real datasets.ConclusionsThe benchmarking results have shown that NucBreak in general predicts assembly errors of different types and sizes with relatively high sensitivity and with lower false discovery rate than the other tools. Such a balance between sensitivity and false discovery rate makes NucBreak a good alternative to the existing assembly accuracy assessment tools and SV detection tools. NucBreak is freely available at https://github.com/uio-bmi/NucBreak under the MPL license.

Highlights

  • Advances in whole genome sequencing strategies have provided the opportunity for genomic and comparative genomic analysis of a vast variety of organisms

  • We examine the ability of NucBreak as well as REAPR, FRCbam, Pilon, BreakDancer, Lumpy, and Wham to detect assembly errors in real and simulated datasets

  • Accuracy assessment in an assembly obtained from real reads We explored the ability of NucBreak, Pilon, REAPR, FRCbam, Lumpy, BreakDancer to detect errors in assemblies obtained from real reads

Read more

Summary

Introduction

Advances in whole genome sequencing strategies have provided the opportunity for genomic and comparative genomic analysis of a vast variety of organisms. Advances in whole genome sequencing technologies have led to a greatly increased number of organisms with sequenced genomes over the recent years. This has provided the opportunity to make genomic and comparative genomic analysis of a vast variety of organisms. There are several tools developed for genome assembly accuracy assessment, i.e. REAPR [2], FRCbam [3] and Pilon [4]. These tools identify regions with various inconsistencies in the alignments of reads mapped back to the assembly and detect the locations of assembly errors. Pilon enables detection of small insertions, deletions and substitutions and performs local assembly to fix detected assembly errors where possible

Results
Discussion
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call