Abstract

BackgroundComparing sets of sequences is a situation frequently encountered in bioinformatics, examples being comparing an assembly to a reference genome, or two genomes to each other. The purpose of the comparison is usually to find where the two sets differ, e.g. to find where a subsequence is repeated or deleted, or where insertions have been introduced. Such comparisons can be done using whole-genome alignments. Several tools for making such alignments exist, but none of them 1) provides detailed information about the types and locations of all differences between the two sets of sequences, 2) enables visualisation of alignment results at different levels of detail, and 3) carefully takes genomic repeats into consideration.ResultsWe here present NucDiff, a tool aimed at locating and categorizing differences between two sets of closely related DNA sequences. NucDiff is able to deal with very fragmented genomes, repeated sequences, and various local differences and structural rearrangements. NucDiff determines differences by a rigorous analysis of alignment results obtained by the NUCmer, delta-filter and show-snps programs in the MUMmer sequence alignment package. All differences found are categorized according to a carefully defined classification scheme covering all possible differences between two sequences. Information about the differences is made available as GFF3 files, thus enabling visualisation using genome browsers as well as usage of the results as a component in an analysis pipeline. NucDiff was tested with varying parameters for the alignment step and compared with existing alternatives, called QUAST and dnadiff.ConclusionsWe have developed a whole genome alignment difference classification scheme together with the program NucDiff for finding such differences. The proposed classification scheme is comprehensive and can be used by other tools. NucDiff performs comparably to QUAST and dnadiff but gives much more detailed results that can easily be visualized. NucDiff is freely available on https://github.com/uio-cels/NucDiff under the MPL license.

Highlights

  • Comparing sets of sequences is a situation frequently encountered in bioinformatics, examples being comparing an assembly to a reference genome, or two genomes to each other

  • They should be able to deal with very fragmented genomes, structural rearrangements, genome sequence duplications, and various differences that are often related to repeated regions

  • A unique feature of NucDiff is that it provides detailed information about the exact genomic locations of the differences in the form of four GFF3 files: two files with information for small and medium local differences that do not cause alignment fragmentation, two others for structural differences and local differences that cause alignment fragmentation

Read more

Summary

Introduction

Comparing sets of sequences is a situation frequently encountered in bioinformatics, examples being comparing an assembly to a reference genome, or two genomes to each other. Several tools for making such alignments exist, but none of them 1) provides detailed information about the types and locations of all differences between the two sets of sequences, 2) enables visualisation of alignment results at different levels of detail, and 3) carefully takes genomic repeats into consideration. For the purpose of detecting differences between sequence sets, tools that can be used to perform WGA analysis should come with certain features They should be able to deal with very fragmented genomes, structural rearrangements, genome sequence duplications, and various differences that are often related to repeated regions. For example, be used for scaffolding purposes, for reference-assisted genome assembly, assembly error detection, and comparison of different assemblies They should enable visualisations of alignment results at different levels of detail. Global scale visualisation can be used for examining duplications, structural rearrangements, and uncovered regions, while local scale visualisation can provide information about small differences, such as substitutions, insertions and deletions (collectively called ‘indels’)

Results
Discussion
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.