Abstract
Second- and third-generation sequencing technologies are driving a revolution in biology and medicine, with ultra-high throughput sequencers now able to produce 200 human genomes every 3 days at a cost of $1000 per genome (Watson, 2014). Meanwhile, in our lab at Edinburgh Genomics and in other labs throughout the World, researchers are generating their first single-molecule reads from a hand-held, USB-powered sequencer as part of Oxford Nanopore’s MinION access programme (MinION Access Programme, 2014 1 ). Whilst the revolution in biology is recognized, the associated revolution in bioinformatics often goes unmentioned. This Frontiers “Research Topic” is about that revolution; it is about data, and data-driven discovery. The sequencers mentioned above, and others from Pacific Biosciences and Ion Torrent, produce either huge amounts of data, data that are very complex, or both. Bioinformaticians throughouttheworldarecreatingnovelpipelines,algorithmsand tools to be able to cope with the huge amount of diverse data types that can be produced. The very first step in many of those pipelines and tools is quality assessment, quality control and artifact removal. These issues all involve data-driven research—what can we learn from the data? What are the data telling us about quality and artifacts? The first group of papers in the research topic deal with quality assessment and reveal pipelines that are in use in sequencing facilities today. The second set of papers deal with applications of sequencing technologies to particular domains, and how we can improve those applications through effective control of quality and artifacts. The final set of papers deal with very specific biological questions, and what we can learn from the raw data to improve our analyses and help us to better answer those questions. A series of bioinformatics pipelines are applied to sequencing data by the data generating facility, and it is important that those who work with sequencing data understand these. Leggett et al. (2013) reveal many of the pipelines and tools used at The Genome Analysis Centre (TGAC), a genomics institute based in Norwich, UK, which has access to every major sequencing platform. Their paper describes every step in the data generation pipeline, from their Laboratory Information Management System (LIMS) to data-specific pipelines for matepair and RAD-Seq libraries. Similarly, Paszkiewicz et al. (2014)
Highlights
Second- and third-generation sequencing technologies are driving a revolution in biology and medicine, with ultra-high throughput sequencers able to produce 200 human genomes every 3 days at a cost of $1000 per genome (Watson, 2014)
A series of bioinformatics pipelines are applied to sequencing data by the data generating facility, and it is important that those who work with sequencing data understand these
Leggett et al (2013) reveal many of the pipelines and tools used at The Genome Analysis Centre (TGAC), a genomics institute based in Norwich, UK, which has access to every major sequencing platform
Summary
Second- and third-generation sequencing technologies are driving a revolution in biology and medicine, with ultra-high throughput sequencers able to produce 200 human genomes every 3 days at a cost of $1000 per genome (Watson, 2014). A series of bioinformatics pipelines are applied to sequencing data by the data generating facility, and it is important that those who work with sequencing data understand these. Their paper describes every step in the data generation pipeline, from their Laboratory Information Management System (LIMS) to data-specific pipelines for matepair and RAD-Seq libraries.
Published Version (Free)
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have