Abstract
Quality control is an essential first step in sequencing data analysis, and softwaretools for quality control are deeply entrenched in standard pipelines at most sequencingcenters. Although the associated computations are straightforward, in many settings thetotal computing effort required for quality control is appreciable and warrants optimization.We present falco, an emulation of the popular FastQC tool that runs on average three timesfaster while generating equivalent results. Compared to FastQC, falco also providesgreater scalability for datasets with longer reads and more flexible visualization of HTMLreports.
Highlights
High-throughput sequencing is routinely used to profile copy number variations in cancers[1], assemble genomes of microbial organisms[2,3], quantify gene expression[4], identify cell populations from single-cell transcriptomes in a variety of tissues[5] and track epigenetic changes in developing organisms and diseases[6], among numerous other applications
We present example datasets from the public domain where FastQC fails to generate reports even when run on high-performance computing hardware, demonstrating that falco expands the range of possible cases in which these quality control metrics can be applied
Falco[13] is a faster alternative to calculate the wide range of QC metrics generated by FastQC
Summary
High-throughput sequencing is routinely used to profile copy number variations in cancers[1], assemble genomes of microbial organisms[2,3], quantify gene expression[4], identify cell populations from single-cell transcriptomes in a variety of tissues[5] and track epigenetic changes in developing organisms and diseases[6], among numerous other applications. When high-throughput sequencing data is generated it often undergoes common upstream analysis steps involving quality control (QC), adapter trimming, filtering contaminants and low-quality reads, and mapping reads to a reference genome or transcriptome. Read mapping should be the most computationally expensive step early in analysis pipelines. The computation required for QC is appreciable, and can no longer be ignored when considering the total cost of sequencing
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have