Abstract

As more and more prokaryotic sequencing takes place, a method to quickly and accurately analyze this data is needed. Previous tools are mainly designed for metagenomic analysis and have limitations; such as long runtimes and significant false positive error rates. The online tool GenomePeek (edwards.sdsu.edu/GenomePeek) was developed to analyze both single genome and metagenome sequencing files, quickly and with low error rates. GenomePeek uses a sequence assembly approach where reads to a set of conserved genes are extracted, assembled and then aligned against the highly specific reference database. GenomePeek was found to be faster than traditional approaches while still keeping error rates low, as well as offering unique data visualization options.

Highlights

  • With the cost of sequencing falling, microbial genomes are being sequenced at an increasing rate

  • There are over 2,000 completed prokaryotic genomes in NCBI (Benson et al, 2009; Sayers et al, 2009), almost 15,000 prokaryote genomes in the SEED database (Overbeek, Disz & Stevens, 2004) and about 75,000 more that are unassembled in the Sequence Read Archive

  • To overcome the limitations and problems associated with the current methodology for analyzing metagenomic data and to implement a tool for pre-screening genomic sequencing data; we developed the web-based tool GenomePeek

Read more

Summary

Introduction

With the cost of sequencing falling, microbial genomes are being sequenced at an increasing rate. There are over 2,000 completed prokaryotic genomes in NCBI (Benson et al, 2009; Sayers et al, 2009), almost 15,000 prokaryote genomes in the SEED database (Overbeek, Disz & Stevens, 2004) and about 75,000 more that are unassembled in the Sequence Read Archive. While complete genome sequencing gives us detailed knowledge about a single prokaryotic species, metagenomic sequencing gives us a broad overview of the microbial environment (Dinsdale et al, 2008). Whether analyzing genomic or metagenomic sequencing, one of the main goals is to identify the taxonomic origin of the specie or species present (Belda-Ferre et al, 2012; Mande, Mohammed & Ghosh, 2012; Carr, Shen-Orr & Borenstein, 2013; Silva et al, 2014). Ensemble approaches use signature data from all of the reads, such as protein

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call