Abstract

Cheap DNA sequencing may soon become routine not only for human genomes but also for practically anything requiring the identification of living organisms from their DNA: tracking of infectious agents, control of food products, bioreactors, or environmental samples. We propose a novel general approach to the analysis of sequencing data where a reference genome does not have to be specified. Using a distributed architecture we are able to query a remote server for hints about what the reference might be, transferring a relatively small amount of data. Our system consists of a server with known reference DNA indexed, and a client with raw sequencing reads. The client sends a sample of unidentified reads, and in return receives a list of matching references. Sequences for the references can be retrieved and used for exhaustive computation on the reads, such as alignment. To demonstrate this approach we have implemented a web server, indexing tens of thousands of publicly available genomes and genomic regions from various organisms and returning lists of matching hits from query sequencing reads. We have also implemented two clients: one running in a web browser, and one as a python script. Both are able to handle a large number of sequencing reads and from portable devices (the browser-based running on a tablet), perform its task within seconds, and consume an amount of bandwidth compatible with mobile broadband networks. Such client-server approaches could develop in the future, allowing a fully automated processing of sequencing data and routine instant quality check of sequencing runs from desktop sequencers. A web access is available at http://tapir.cbs.dtu.dk. The source code for a python command-line client, a server, and supplementary data are available at http://bit.ly/1aURxkc.

Highlights

  • The sequencing of DNA has become increasingly affordable during the last decade [1] and modern high-end sequencers have the capacity to process the equivalent of several human genomes or several hundred bacteria

  • Patients will be sequenced routinely, outbreaks of infectious agents traced by their DNA, quality of water and food monitored with DNA sequencing

  • Using synthetic reads generated from full genomes, we demonstrate that identifying a pure culture from raw DNA sequencing reads can be achieved with 100 random reads

Read more

Summary

Introduction

The sequencing of DNA has become increasingly affordable during the last decade [1] and modern high-end sequencers have the capacity to process the equivalent of several human genomes or several hundred bacteria. Current desktop sequencers require limited initial investments, and are providing flexibility over sequencing volumes. The sequencing of complete bacterial genomes from isolates can be performed in a day. Recent announcements on nanopore sequencing [2] are even suggesting that sequencers could be so cheap that they would be disposable. Extracting DNA is itself a relatively simple procedure, and it is foreseeable that DNA sequencing will soon be a relatively cheap routine procedure in molecular biology. Patients will be sequenced routinely, outbreaks of infectious agents traced by their DNA, quality of water and food monitored with DNA sequencing

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call