QC-Chain: Fast and Holistic Quality Control Method for Next-Generation Sequencing Data

Qian Zhou,Jian Xu,Anhui Wang,Xiaoquan Su,Kang Ning

doi:10.1371/journal.pone.0060234

Abstract

Next-generation sequencing (NGS) technologies have been widely used in life sciences. However, several kinds of sequencing artifacts, including low-quality reads and contaminating reads, were found to be quite common in raw sequencing data, which compromise downstream analysis. Therefore, quality control (QC) is essential for raw NGS data. However, although a few NGS data quality control tools are publicly available, there are two limitations: First, the processing speed could not cope with the rapid increase of large data volume. Second, with respect to removing the contaminating reads, none of them could identify contaminating sources de novo, and they rely heavily on prior information of the contaminating species, which is usually not available in advance. Here we report QC-Chain, a fast, accurate and holistic NGS data quality-control method. The tool synergeticly comprised of user-friendly tools for (1) quality assessment and trimming of raw reads using Parallel-QC, a fast read processing tool; (2) identification, quantification and filtration of unknown contamination to get high-quality clean reads. It was optimized based on parallel computation, so the processing speed is significantly higher than other QC methods. Experiments on simulated and real NGS data have shown that reads with low sequencing quality could be identified and filtered. Possible contaminating sources could be identified and quantified de novo, accurately and quickly. Comparison between raw reads and processed reads also showed that subsequent analyses (genome assembly, gene prediction, gene annotation, etc.) results based on processed reads improved significantly in completeness and accuracy. As regard to processing speed, QC-Chain achieves 7–8 time speed-up based on parallel computation as compared to traditional methods. Therefore, QC-Chain is a fast and useful quality control tool for read quality process and de novo contamination filtration of NGS reads, which could significantly facilitate downstream analysis. QC-Chain is publicly available at: http://www.computationalbioenergy.org/qc-chain.html.

Highlights

Next-generation sequencing (NGS) technologies, which could produce numerous sequences in a single experiment in a relatively short time, have been widely applied in life sciences
When applying quality control (QC)-Chain on the dataset, the results showed a significant improvement in the downstream analysis (Table 3), but when other tools such as FastQC, FASTX-Toolkit, PRINSEQ or NGS QC were applied, since the simulated data were designed to be of high-quality reads, few reads were filtered because of low sequencing-quality and the analysis result is equivalent to that obtained from total reads
Read quality process module (Parallel-QC), together with rRNA identification module and in-house scripts were used in this method to accomplish the comprehensive quality control process

Summary

Introduction

Next-generation sequencing (NGS) technologies, which could produce numerous sequences (reads) in a single experiment in a relatively short time, have been widely applied in life sciences. Several kinds of sequencing artifacts, which could introduce serious negative impact on downstream analyses, commonly exist in raw reads, regardless of the sequencing platform. These sequence artifacts could be classified into two groups:. For the sequencing quality problem, other than the QC pipeline supplied by the sequencing instrument manufactures, a few online/standalone tools are publicly available, such as PRINSEQ [2], FASTXToolkit (http://hannonlab.cshl.edu/fastx_toolkit/) and NGSQC Toolkit [3] These tools have specific features and were developed based on different concepts and algorithms, yet are not sufficiently optimized on their own

Methods

Results

Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: PLoS ONE	Publication Date: Apr 2, 2013
Citations: 80	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

QC-Chain: Fast and Holistic Quality Control Method for Next-Generation Sequencing Data

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: PLoS ONE

Lead the way for us

Similar Papers

Abstract 2280: A comprehensive sample tracking and data processing workflow for next generation sequencing
Chandra Sekhar Pedamallu ... Mariia Zueva
Cancer Research | VOL. 81
Chandra Sekhar Pedamallu, et. al.Chandra Sekhar Pedamallu ... Mariia Zueva
01 Jul 2021
Abstract 2280: A comprehensive sample tracking and data processing workflow for next generation sequencing
Chandra Sekhar Pedamallu ... Mariia Zueva

Don't just dump your data and run: Authors should submit as much experimental information as possible when uploading sequence data.
Matheus Sanitá Lima ... David Roy Smith
EMBO reports | VOL. 18
Matheus Sanitá Lima, et. al.Matheus Sanitá Lima ... David Roy Smith
27 Oct 2017
EMBO reports | VOL. 18

RNA-QC-chain: comprehensive and fast quality control for RNA-Seq data
Qian Zhou ... Xiaoquan Su
BMC Genomics | VOL. 19
Qian Zhou, et. al.Qian Zhou ... Xiaoquan Su
14 Feb 2018
BMC Genomics | VOL. 19

Abstract 1660: Identification of allelic imbalance utilizing heterozygous genotype allele frequencies and intensities
Kyle Chang ... Zuhal Ozcan
Cancer Research | VOL. 79
Kyle Chang, et. al.Kyle Chang ... Zuhal Ozcan
01 Jul 2019
Abstract 1660: Identification of allelic imbalance utilizing heterozygous genotype allele frequencies and intensities
Kyle Chang ... Zuhal Ozcan

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

QC-Chain: Fast and Holistic Quality Control Method for Next-Generation Sequencing Data

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: PLoS ONE