Scaffolder - Software for Reproducible Genome Scaffolding.

Michael Barton,Hazel Barton

doi:10.1038/npre.2011.5779.1

Abstract

Abstract Background: Assembly of short-read sequencing data can result in a fragmented non-contiguous series of genomic sequences. Therefore a common step in a genome project is to join neighboring sequence regions together and fill gaps in the assembly using additional sequences. This scaffolding step, however, is non-trivial and requires manually editing large blocks of nucleotide sequence. Joining these sequences together also hides the source of each region in the final genome sequence. Taken together, these considerations may make reproducing or editing an existing genome build difficult. Methods: The software outlined here, “Scaffolder,” is implemented in the Ruby programming language and can be installed via the RubyGems software management system. Genome scaffolds are defined using YAML - a data format, which is both human and machine-readable. Command line binaries and extensive documentation are available. Results: This software allows a genome build to be defined in terms of the constituent sequences using a relatively simple syntax to define the scaffold. This syntax further allows unknown regions to be defined, and adds additional sequences to fill gaps in the scaffold. Defining the genome construction in a file makes the scaffolding process reproducible and easier to edit compared with FASTA nucleotide sequence. Conclusions: Scaffolder is easy-to-use genome scaffolding software. This tool promotes reproducibility and continuous development in a genome project. Scaffolder can be found at http://next.gs.

Highlights

Of short-read sequencing data can result in a fragmented non-contiguous series of genomic sequences
Plain-text scaffold files written in the YAML syntax specify how these sequences should be joined and the scaffolder software is used to generate the scaffold sequence from these instructions
Scaffolder is software aimed at both bioinformaticians and biologists familiar with the command line who wish to build a genome scaffold from a set of contigs

Summary

Introduction

Of short-read sequencing data can result in a fragmented non-contiguous series of genomic sequences. A common step in a genome project is to join neighbouring sequence regions together and fill gaps in the assembly using additional sequences This scaffolding step, is non-trivial and requires manually editing large blocks of nucleotide sequence. Joining these sequences together hides the source of each region in the final genome sequence. Software takes the nucleotide reads produced by sequencing hardware and, in the ideal case, outputs a single complete genome sequence composed of these individual fragments. An analogy for this process is a jigsaw puzzle: each nucleotide read represents a single piece, and the final genome sequence is the completed puzzle. This may be due to not enough, or multiple different overlaps between reads and is analogous to missing pieces in the jigsaw or pieces that fit to multiple other pieces

Methods

Results

Discussion

Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Scaffolder - Software for Reproducible Genome Scaffolding.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Nature Precedings

Lead the way for us

Journal: Nature Precedings	Publication Date: Mar 14, 2011
License type: CC BY 3.0

Similar Papers

Scaffolder - software for manual genome scaffolding
Michael D Barton ... Hazel A Barton
Source Code for Biology and Medicine | VOL. 7
Michael D Barton, et. al.Michael D Barton ... Hazel A Barton
28 May 2012
Source Code for Biology and Medicine | VOL. 7

Bioinformatic approaches to assigning protein function from novel sequence data.
David Michalovich ... Richard Fagan
Methods in molecular medicine | VOL. 104
David Michalovich, et. al.David Michalovich ... Richard Fagan
01 Jan 2004
Methods in molecular medicine | VOL. 104

Data, Data Everywhere …
Richard Glynne
Cell | VOL. 101
Richard GlynneRichard Glynne
01 Apr 2000
Cell | VOL. 101

Surfing the DNA databases for K+ channels nets yet more diversity
Lawrence Salkoff ... Timothy Jegla
Neuron | VOL. 15
Lawrence Salkoff, et. al.Lawrence Salkoff ... Timothy Jegla
01 Sep 1995
Neuron | VOL. 15

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Scaffolder - Software for Reproducible Genome Scaffolding.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Nature Precedings