SeqAssist: a novel toolkit for preliminary analysis of next-generation sequencing data.

Yan Peng,Nan Wang,Chaoyang Zhang,Andrew S Maxwell,Natalie D Barker,Jennifer G Laird,Ping Gong,Alan J Kennedy

doi:10.1186/1471-2105-15-s11-s10

Abstract

BackgroundWhile next-generation sequencing (NGS) technologies are rapidly advancing, an area that lags behind is the development of efficient and user-friendly tools for preliminary analysis of massive NGS data. As an effort to fill this gap to keep up with the fast pace of technological advancement and to accelerate data-to-results turnaround, we developed a novel software package named SeqAssist ("Sequencing Assistant" or SA).ResultsSeqAssist takes NGS-generated FASTQ files as the input, employs the BWA-MEM aligner for sequence alignment, and aims to provide a quick overview and basic statistics of NGS data. It consists of three separate workflows: (1) the SA_RunStats workflow generates basic statistics about an NGS dataset, including numbers of raw, cleaned, redundant and unique reads, redundancy rate, and a list of unique sequences with length and read count; (2) the SA_Run2Ref workflow estimates the breadth, depth and evenness of genome-wide coverage of the NGS dataset at a nucleotide resolution; and (3) the SA_Run2Run workflow compares two NGS datasets to determine the redundancy (overlapping rate) between the two NGS runs. Statistics produced by SeqAssist or derived from SeqAssist output files are designed to inform the user: whether, what percentage, how many times and how evenly a genomic locus (i.e., gene, scaffold, chromosome or genome) is covered by sequencing reads, how redundant the sequencing reads are in a single run or between two runs. These statistics can guide the user in evaluating the quality of a DNA library prepared for RNA-Seq or genome (re-)sequencing and in deciding the number of sequencing runs required for the library. We have tested SeqAssist using a synthetic dataset and demonstrated its main features using multiple NGS datasets generated from genome re-sequencing experiments.ConclusionsSeqAssist is a useful and informative tool that can serve as a valuable "assistant" to a broad range of investigators who conduct genome re-sequencing, RNA-Seq, or de novo genome sequencing and assembly experiments.

Highlights

While next-generation sequencing (NGS) technologies are rapidly advancing, an area that lags behind is the development of efficient and user-friendly tools for preliminary analysis of massive NGS data
We have demonstrated the main features of SeqAssist using multiple genome re-sequencing datasets
Output statistics from SeqAssist can guide the user in evaluating the quality of a DNA library prepared for genome re

Summary

Results

SeqAssist takes NGS-generated FASTQ files as the input, employs the BWA-MEM aligner for sequence alignment, and aims to provide a quick overview and basic statistics of NGS data. It consists of three separate workflows: (1) the SA_RunStats workflow generates basic statistics about an NGS dataset, including numbers of raw, cleaned, redundant and unique reads, redundancy rate, and a list of unique sequences with length and read count; (2) the SA_Run2Ref workflow estimates the breadth, depth and evenness of genome-wide coverage of the NGS dataset at a nucleotide resolution; and (3) the SA_Run2Run workflow compares two NGS datasets to determine the redundancy (overlapping rate) between the two NGS runs. We have tested SeqAssist using a synthetic dataset and demonstrated its main features using multiple NGS datasets generated from genome re-sequencing experiments

Background

Results and discussion

Conclusions

Mardis ER

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: BMC Bioinformatics	Publication Date: Oct 21, 2014
Citations: 27	License type: cc-by

R Discovery Prime

R Discovery Prime

SeqAssist: a novel toolkit for preliminary analysis of next-generation sequencing data.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: BMC Bioinformatics

Lead the way for us

Similar Papers

Population Genomics Training for the Next Generation of Conservation Geneticists: ConGen 2018 Workshop.
Jeffrey Strait ... Jill Pecon-Slattery
The Journal of heredity | VOL. 111
Jeffrey Strait, et. al.Jeffrey Strait ... Jill Pecon-Slattery
10 Mar 2020
Population Genomics Training for the Next Generation of Conservation Geneticists: ConGen 2018 Workshop.
Jeffrey Strait ... Jill Pecon-Slattery

Recent novel approaches for population genomics data analysis
Kimberly R Andrews ... Gordon Luikart
Molecular Ecology | VOL. 23
Kimberly R Andrews, et. al.Kimberly R Andrews ... Gordon Luikart
26 Mar 2014
Molecular Ecology | VOL. 23

PM4NGS, a project management framework for next-generation sequencing data analysis.
Roberto Vera Alvarez ... Lorinc Pongor
GigaScience | VOL. 10
Roberto Vera Alvarez, et. al.Roberto Vera Alvarez ... Lorinc Pongor
07 Jan 2021
GigaScience | VOL. 10

ORIO (Online Resource for Integrative Omics): a web-based platform for rapid integration of next generation sequencing data.
Christopher A Lavender ... David C Fargo
Nucleic acids research | VOL. 45
Christopher A Lavender, et. al.Christopher A Lavender ... David C Fargo
11 Apr 2017
Nucleic acids research | VOL. 45

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

SeqAssist: a novel toolkit for preliminary analysis of next-generation sequencing data.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: BMC Bioinformatics