Abstract

Scaffolding is an essential step during the de novo sequence assembly process to infer the direction and order relationships between the contigs and make the sequence assembly results more continuous and complete. However, scaffolding still faces the challenges of repetitive regions in genome, sequencing errors and uneven sequencing depth. Moreover, the accuracy of scaffolding greatly depends on the quality of contigs. Generally, the existing scaffolding methods construct a scaffold graph, and then optimize the graph by deleting spurious edges. Nevertheless, due to the wrong joints between contigs, some correct edges connecting contigs may be deleted. In this study, we present a novel scaffolding method SCOP, which is the first method to classify the contigs and utilize the vertices and edges to optimize the scaffold graph. Specially, SCOP employs alignment features and GC-content of paired reads to evaluate the quality of contigs (vertices), and divide the contigs into three types (True, Uncertain and Misassembled), and then optimizes the scaffold graph based on the classification of contigs together with the alignment of edges. The experiment results on the datasets of GAGE-A and GAGE-B demonstrate that SCOP performs better than 12 other competing scaffolders. SCOP is publicly available for download at https://github.com/bioinfomaticsCSU/SCOP. Supplementary data are available at Bioinformatics online.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.