Ngs_backbone: a pipeline for read cleaning, mapping and SNP calling using next generation sequence.

Jose M Blanca,Laura Pascual,Peio Ziarsolo,Joaquin Cañizares,Fernando Nuez

doi:10.1186/1471-2164-12-285

Jose M Blanca, Laura Pascual + Show 3 more

Open Access

PDF Available

https://doi.org/10.1186/1471-2164-12-285

Copy DOI

Export

Save

Cite

Journal: BMC Genomics	Publication Date: Jun 2, 2011
Citations: 57	License type: CC BY 4.0

Affiliation: Universitat Politècnica de València

Abstract
Highlights/Summary
Full-Text PDF
Similar Papers

Abstract

Listen

BackgroundThe possibilities offered by next generation sequencing (NGS) platforms are revolutionizing biotechnological laboratories. Moreover, the combination of NGS sequencing and affordable high-throughput genotyping technologies is facilitating the rapid discovery and use of SNPs in non-model species. However, this abundance of sequences and polymorphisms creates new software needs. To fulfill these needs, we have developed a powerful, yet easy-to-use application.ResultsThe ngs_backbone software is a parallel pipeline capable of analyzing Sanger, 454, Illumina and SOLiD (Sequencing by Oligonucleotide Ligation and Detection) sequence reads. Its main supported analyses are: read cleaning, transcriptome assembly and annotation, read mapping and single nucleotide polymorphism (SNP) calling and selection. In order to build a truly useful tool, the software development was paired with a laboratory experiment. All public tomato Sanger EST reads plus 14.2 million Illumina reads were employed to test the tool and predict polymorphism in tomato. The cleaned reads were mapped to the SGN tomato transcriptome obtaining a coverage of 4.2 for Sanger and 8.5 for Illumina. 23,360 single nucleotide variations (SNVs) were predicted. A total of 76 SNVs were experimentally validated, and 85% were found to be real.Conclusionsngs_backbone is a new software package capable of analyzing sequences produced by NGS technologies and predicting SNVs with great accuracy. In our tomato example, we created a highly polymorphic collection of SNVs that will be a useful resource for tomato researchers and breeders. The software developed along with its documentation is freely available under the AGPL license and can be downloaded from http://bioinf.comav.upv.es/ngs_backbone/ or http://github.com/JoseBlanca/franklin.

Highlights

The possibilities offered by generation sequencing (NGS) platforms are revolutionizing biotechnological laboratories
CT, USA[2]), mappers and file formats (SAMtools [8], VCF [9]). These fast-paced developments have made the field of bioinformatics very dynamic and difficult to follow despite the guidance provided by resources like the SEQanswers internet forum [10], which is dedicated to presenting and documenting the tools used to analyze next generation sequencing (NGS) data
Once the tomato collection was genotyped, using single nucleotide variations (SNVs) randomly selected from these sets, we found that 3 out of 14 SNVs tested in the Sanger set and 5 out of 12 in the Illumina set were polymorphic, which is to say that the most frequent allele frequency was lower than 95%

Summary

Introduction

The possibilities offered by generation sequencing (NGS) platforms are revolutionizing biotechnological laboratories. CT, USA[2]) sequence for a non-model species transcriptome or an Illumina-based (Illumina, San Diego, CA, USA[3]) genomic or transcriptomic resequencing of several samples is very affordable These new sequencing technologies cannot be analyzed with older software designed for Sanger sequencing. CT, USA[2]), mappers (e.g., bwa [6], Bowtie [7]) and file formats (SAMtools [8], VCF [9]) These fast-paced developments have made the field of bioinformatics very dynamic and difficult to follow despite the guidance provided by resources like the SEQanswers internet forum [10], which is dedicated to presenting and documenting the tools used to analyze NGS data. Both the selection of the various programs and parameters as well as the creation of these small scripts render the analysis process cumbersome and non-reproducible, especially if the laboratory lacks a dedicated bioinformatics staff

Objectives

Results

Conclusion