Abstract

Quality control is an essential first step in sequencing data analysis, and softwaretools for quality control are deeply entrenched in standard pipelines at most sequencingcenters. Although the associated computations are straightforward, in many settings thetotal computing effort required for quality control is appreciable and warrants optimization.We present falco, an emulation of the popular FastQC tool that runs on average three timesfaster while generating equivalent results. Compared to FastQC, falco also providesgreater scalability for datasets with longer reads and more flexible visualization of HTMLreports.

Highlights

  • High-throughput sequencing is routinely used to profile copy number variations in cancers[1], assemble genomes of microbial organisms[2,3], quantify gene expression[4], identify cell populations from single-cell transcriptomes in a variety of tissues[5] and track epigenetic changes in developing organisms and diseases[6], among numerous other applications

  • We present example datasets from the public domain where FastQC fails to generate reports even when run on high-performance computing hardware, demonstrating that falco expands the range of possible cases in which these quality control metrics can be applied

  • Falco[13] is a faster alternative to calculate the wide range of QC metrics generated by FastQC

Read more

Summary

Introduction

High-throughput sequencing is routinely used to profile copy number variations in cancers[1], assemble genomes of microbial organisms[2,3], quantify gene expression[4], identify cell populations from single-cell transcriptomes in a variety of tissues[5] and track epigenetic changes in developing organisms and diseases[6], among numerous other applications. When high-throughput sequencing data is generated it often undergoes common upstream analysis steps involving quality control (QC), adapter trimming, filtering contaminants and low-quality reads, and mapping reads to a reference genome or transcriptome. Read mapping should be the most computationally expensive step early in analysis pipelines. The computation required for QC is appreciable, and can no longer be ignored when considering the total cost of sequencing

Objectives
Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call