Abstract
BackgroundDespite the recent progress in genome sequencing and assembly, many of the currently available assembled genomes come in a draft form. Such draft genomes consist of a large number of genomic fragments (scaffolds), whose positions and orientations along the genome are unknown. While there exists a number of methods for reconstruction of the genome from its scaffolds, utilizing various computational and wet-lab techniques, they often can produce only partial error-prone scaffold assemblies. It therefore becomes important to compare and merge scaffold assemblies produced by different methods, thus combining their advantages and highlighting present conflicts for further investigation. These tasks may be labor intensive if performed manually.ResultsWe present CAMSA—a tool for comparative analysis and merging of two or more given scaffold assemblies. The tool (i) creates an extensive report with several comparative quality metrics; (ii) constructs the most confident merged scaffold assembly; and (iii) provides an interactive framework for a visual comparative analysis of the given assemblies. Among the CAMSA features, only scaffold merging can be evaluated in comparison to existing methods. Namely, it resembles the functionality of assembly reconciliation tools, although their primary targets are somewhat different. Our evaluations show that CAMSA produces merged assemblies of comparable or better quality than existing assembly reconciliation tools while being the fastest in terms of the total running time.ConclusionsCAMSA addresses the current deficiency of tools for automated comparison and analysis of multiple assemblies of the same set scaffolds. Since there exist numerous methods and techniques for scaffold assembly, identifying similarities and dissimilarities across assemblies produced by different methods is beneficial both for the developers of scaffold assembly algorithms and for the researchers focused on improving draft assemblies of specific organisms.
Highlights
Despite the recent progress in genome sequencing and assembly, many of the currently available assembled genomes come in a draft form
Analysis and visualization For the purpose of comparative analysis and visualization of the input scaffold assemblies, CAMSA utilizes the breakpoint graphs, the data structure traditionally used for analysis of gene orders across multiple species [30]
Dealing with Unoriented scaffolds While conventional multiple breakpoint graphs are constructed for sequences of oriented genes, in CAMSA we extend scaffold assembly graphs to support assemblies that may include oriented as well as unoriented scaffolds
Summary
Despite the recent progress in genome sequencing and assembly, many of the currently available assembled genomes come in a draft form Such draft genomes consist of a large number of genomic fragments (scaffolds), whose positions and orientations along the genome are unknown. The majority of currently available genomes come in a draft form represented by a large number of scaffolds rather than complete chromosomes [1]. This emphasizes the need for improving the assembly quality of genomes by constructing longer scaffolds from the given ones, which we refer to as the scaffold assembly problem. The scaffold assembly problem asks for reconstruction of the order of input scaffolds along the genome chromosomes
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.