Abstract

BackgroundContiguous genome assemblies are a highly valued biological resource because of the higher number of completely annotated genes and genomic elements that are usable compared to fragmented draft genomes. Nonetheless, contiguity is difficult to obtain if only low coverage data and/or only distantly related reference genome assemblies are available.FindingsIn order to improve genome contiguity, we have developed Cross-Species Scaffolding—a new pipeline that imports long-range distance information directly into the de novo assembly process by constructing mate-pair libraries in silico.ConclusionsWe show how genome assembly metrics and gene prediction dramatically improve with our pipeline by assembling two primate genomes solely based on ∼30x coverage of shotgun sequencing data.

Highlights

  • Contiguous genome assemblies are a highly valued biological resource because of the higher number of completely annotated genes and genomic elements that are usable compared to fragmented draft genomes

  • We show how genome assembly metrics and gene prediction dramatically improve with our pipeline by assembling two primate genomes solely based on ∼30x coverage of shotgun sequencing data

  • To overcome the necessity for long-range sequencing data, which, depending on the project, is either expensive to generate or unobtainable in the first place, we developed a workflow to aid genome assembly that only requires paired-end read data of the query organism and that uses available reference genomes as a basis for generating long-range information by constructing mate-pair or scaffolding libraries in silico (Fig. 1)

Read more

Summary

Introduction

Contiguous genome assemblies are a highly valued biological resource because of the higher number of completely annotated genes and genomic elements that are usable compared to fragmented draft genomes. Findings: In order to improve genome contiguity, we have developed Cross-Species Scaffolding—a new pipeline that imports long-range distance information directly into the de novo assembly process by constructing mate-pair libraries in silico. Conclusions: We show how genome assembly metrics and gene prediction dramatically improve with our pipeline by assembling two primate genomes solely based on ∼30x coverage of shotgun sequencing data. Complete, and well-annotated genomes provide a wealth of information about the past, present, and future of species and individuals and, constitute highly valuable resources for medical and biological research [1].

Methods
Results
Discussion
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call