Abstract

BackgroundNext generation sequencing datasets are stored as FASTQ formatted files. In order to avoid downstream artefacts, it is critical to implement a robust preprocessing protocol of the FASTQ sequence in order to determine the integrity and quality of the data.ResultsHere I describe fastQ_brew which is a package that provides a suite of methods to evaluate sequence data in FASTQ format and efficiently implements a variety of manipulations to filter sequence data by size, quality and/or sequence. fastQ_brew allows for mismatch searches to adapter sequences, left and right end trimming, removal of duplicate reads, as well as reads containing non-designated bases. fastQ_brew also returns summary statistics on the unfiltered and filtered FASTQ data, and offers FASTQ to FASTA conversion as well as FASTQ reverse complement and DNA to RNA manipulations.ConclusionsfastQ_brew is open source and freely available to all users at the following webpage: https://github.com/dohalloran/fastQ_brew.

Highlights

  • Generation sequencing datasets are stored as FASTQ formatted files

  • FASTQ format has become the principal protocol for the exchange of DNA sequencing files [1]

  • In order to evaluate the quality of the FASTQ dataset and to avoid downstream artefacts, it is imperative for the user to employ robust quality control and preprocessing steps prior to downstream FASTQ applications

Read more

Summary

Introduction

Generation sequencing datasets are stored as FASTQ formatted files. In order to avoid downstream artefacts, it is critical to implement a robust preprocessing protocol of the FASTQ sequence in order to determine the integrity and quality of the data. *Correspondence: damienoh@gwu.edu 2 Department of Biological Sciences, The George Washington University, 636 Ross Hall, 2300 I St. N.W., Washington, DC 20052, USA Full list of author information is available at the end of the article I describe fastQ_brew, which is a robust package that performs quality control, reformatting, filtering, and trimming of FASTQ formatted sequence datasets.

Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call