LDscaff: LD-based scaffolding of de novo genome assemblies

Zicheng Zhao,Changfa Wang,Yingxiao Zhou,Shuaicheng Li,Shuai Wang,Xiuqing Zhang

doi:10.1186/s12859-020-03895-7

Zicheng Zhao, Changfa Wang + Show 4 more

Open Access

https://doi.org/10.1186/s12859-020-03895-7

Copy DOI

Abstract

BackgroundGenome assembly is fundamental for de novo genome analysis. Hybrid assembly, utilizing various sequencing technologies increases both contiguity and accuracy. While such approaches require extra costly sequencing efforts, the information provided millions of existed whole-genome sequencing data have not been fully utilized to resolve the task of scaffolding. Genetic recombination patterns in population data indicate non-random association among alleles at different loci, can provide physical distance signals to guide scaffolding.ResultsIn this paper, we propose LDscaff for draft genome assembly incorporating linkage disequilibrium information in population data. We evaluated the performance of our method with both simulated data and real data. We simulated scaffolds by splitting the pig reference genome and reassembled them. Gaps between scaffolds were introduced ranging from 0 to 100 KB. The genome misassembly rate is 2.43% when there is no gap. Then we implemented our method to refine the Giant Panda genome and the donkey genome, which are purely assembled by NGS data. After LDscaff treatment, the resulting Panda assembly has scaffold N50 of 3.6 MB, 2.5 times larger than the original N50 (1.3 MB). The re-assembled donkey assembly has an improved N50 length of 32.1 MB from 23.8 MB.ConclusionsOur method effectively improves the assemblies with existed re-sequencing data, and is an potential alternative to the existing assemblers required for the collection of new data.

Highlights

Genome assembly is fundamental for de novo genome analysis
Long-range scaffolding technologies can provide long-range connectivity, which can aid in resolving the complex regions
Fosmid cloning is sensitive to the quantity and quality of the input DNA, while fosmid libraries are subject to cloning bias

Summary

Introduction

Hybrid assembly, utilizing various sequencing technologies increases both contiguity and accuracy While such approaches require extra costly sequencing efforts, the information provided millions of existed whole-genome sequencing data have not been fully utilized to resolve the task of scaffolding. Long-range scaffolding technologies can provide long-range connectivity, which can aid in resolving the complex regions Such methods include end sequencing of fosmid clones [1], fosmid-based dilution pool sequencing [13, 14], optical mapping [15,16,17], genetic mapping with restriction site associated DNA (RAD) tags [18] and proximity ligation (Hi-C) sequencing. The data generating process for optical map construction involves mostly manual steps These steps include DNA extension and image capture, which are low throughput and inefficient. The Hi-C data provide extensive links covering large distances, the current resolution is not high enough for the local ordering of small adjacent contigs

Results

Discussion

Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: BMC Bioinformatics	Publication Date: Dec 1, 2020
Citations: 3	License type: open-access

R Discovery Prime

R Discovery Prime

LDscaff: LD-based scaffolding of de novo genome assemblies

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: BMC Bioinformatics

Lead the way for us

Similar Papers

Next Generation Semiconductor Based Sequencing of the Donkey (Equus asinus) Genome Provided Comparative Sequence Data against the Horse Genome and a Few Millions of Single Nucleotide Polymorphisms.
Francesca Bertolini ... Vincenzo Chiofalo
PLOS ONE | VOL. 10
Francesca Bertolini, et. al.Francesca Bertolini ... Vincenzo Chiofalo
07 Jul 2015
PLOS ONE | VOL. 10

Genome collinearity analysis illuminates the evolution of donkey chromosome 1 and horse chromosome 5 in perissodactyls: A comparative study
Shaohua Li ... Jinfeng Wang
BMC Genomics | VOL. 22
Shaohua Li, et. al.Shaohua Li ... Jinfeng Wang
15 Sep 2021
BMC Genomics | VOL. 22

Simulating Realistic Continuous Glucose Monitor Time Series By Data Augmentation.
Louis A Gomez ... R Stanley Hum
Journal of diabetes science and technology | VOL. -
Louis A Gomez, et. al.Louis A Gomez ... R Stanley Hum
23 Jun 2023
Journal of diabetes science and technology | VOL. -

MixIndependR: a R package for statistical independence testing of loci in database of multi-locus genotypes
Bing Song ... John Planz
BMC Bioinformatics | VOL. 22
Bing Song, et. al.Bing Song ... John Planz
06 Jan 2021
BMC Bioinformatics | VOL. 22

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

LDscaff: LD-based scaffolding of de novo genome assemblies

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: BMC Bioinformatics