Abstract

Methods to assemble sequence reads into larger pieces are described. In many cases, the raw data of sequencing machines are pictures, which are translated in a subsequent analysis step (base calling) into sequence reads. Each position of a sequence read receives a quality score, indicating the probability of a sequencing error. After quality filtering and trimming of adapter regions or barcoding indices, these reads can be assembled de novo into larger pieces. Basically three different types of assembly strategies are in use: greedy algorithms, overlap-layout-consensus assemblers and methods relying on k-mer graphs. Overlapping reads producing contiguous sequences are named contigs. Positional information from paired-end reads or mate pairs can be used to order contigs into scaffolds. In the ideal case of genome sequencing, the number of scaffolds would equal the number of expected chromosomes. Several statistics can be used to describe or compare different sequence assemblies. Generally, a diversity of programs and chosen parameters should be explored to find the best assembly. Different strategies are used for genome, transcriptome and metagenome assemblies, and all of them greatly benefit from the inclusion of long reads. Assembly methods are becoming an increasingly important tool for everybody working with sequence data, since the vast majority of published sequence data in NCBI GenBank is deposited as short reads in the sequence read archive (► http://www.ncbi.nlm.nih.gov/sra/). This data is usually not directly searchable by methods like BLAST and needs to be assembled for subsequent analysis.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.