Abstract
BackgroundContiguous genome assemblies are a highly valued biological resource because of the higher number of completely annotated genes and genomic elements that are usable compared to fragmented draft genomes. Nonetheless, contiguity is difficult to obtain if only low coverage data and/or only distantly related reference genome assemblies are available.FindingsIn order to improve genome contiguity, we have developed Cross-Species Scaffolding—a new pipeline that imports long-range distance information directly into the de novo assembly process by constructing mate-pair libraries in silico.ConclusionsWe show how genome assembly metrics and gene prediction dramatically improve with our pipeline by assembling two primate genomes solely based on ∼30x coverage of shotgun sequencing data.
Highlights
Contiguous genome assemblies are a highly valued biological resource because of the higher number of completely annotated genes and genomic elements that are usable compared to fragmented draft genomes
We show how genome assembly metrics and gene prediction dramatically improve with our pipeline by assembling two primate genomes solely based on ∼30x coverage of shotgun sequencing data
To overcome the necessity for long-range sequencing data, which, depending on the project, is either expensive to generate or unobtainable in the first place, we developed a workflow to aid genome assembly that only requires paired-end read data of the query organism and that uses available reference genomes as a basis for generating long-range information by constructing mate-pair or scaffolding libraries in silico (Fig. 1)
Summary
Contiguous genome assemblies are a highly valued biological resource because of the higher number of completely annotated genes and genomic elements that are usable compared to fragmented draft genomes. Findings: In order to improve genome contiguity, we have developed Cross-Species Scaffolding—a new pipeline that imports long-range distance information directly into the de novo assembly process by constructing mate-pair libraries in silico. Conclusions: We show how genome assembly metrics and gene prediction dramatically improve with our pipeline by assembling two primate genomes solely based on ∼30x coverage of shotgun sequencing data. Complete, and well-annotated genomes provide a wealth of information about the past, present, and future of species and individuals and, constitute highly valuable resources for medical and biological research [1].
Published Version (
Free)
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have