TagDigger: user-friendly extraction of read counts from GBS and RAD-seq data.

Lindsay V Clark,Erik J Sacks

doi:10.1186/s13029-016-0057-7

Lindsay V Clark, Erik J Sacks

Open Access

https://doi.org/10.1186/s13029-016-0057-7

Copy DOI

Abstract

BackgroundIn genotyping-by-sequencing (GBS) and restriction site-associated DNA sequencing (RAD-seq), read depth is important for assessing the quality of genotype calls and estimating allele dosage in polyploids. However, existing pipelines for GBS and RAD-seq do not provide read counts in formats that are both accurate and easy to access. Additionally, although existing pipelines allow previously-mined SNPs to be genotyped on new samples, they do not allow the user to manually specify a subset of loci to examine. Pipelines that do not use a reference genome assign arbitrary names to SNPs, making meta-analysis across projects difficult.ResultsWe created the software TagDigger, which includes three programs for analyzing GBS and RAD-seq data. The first script, tagdigger_interactive.py, rapidly extracts read counts and genotypes from FASTQ files using user-supplied sets of barcodes and tags. Input and output is in CSV format so that it can be opened by spreadsheet software. Tag sequences can also be imported from the Stacks, TASSEL-GBSv2, TASSEL-UNEAK, or pyRAD pipelines, and a separate file can be imported listing the names of markers to retain. A second script, tag_manager.py, consolidates marker names and sequences across multiple projects. A third script, barcode_splitter.py, assists with preparing FASTQ data for deposit in a public archive by splitting FASTQ files by barcode and generating MD5 checksums for the resulting files.ConclusionsTagDigger is open-source and freely available software written in Python 3. It uses a scalable, rapid search algorithm that can process over 100 million FASTQ reads per hour. TagDigger will run on a laptop with any operating system, does not consume hard drive space with intermediate files, and does not require programming skill to use.Electronic supplementary materialThe online version of this article (doi:10.1186/s13029-016-0057-7) contains supplementary material, which is available to authorized users.

Highlights

In genotyping-by-sequencing (GBS) and restriction site-associated DNA sequencing (RAD-seq), read depth is important for assessing the quality of genotype calls and estimating allele dosage in polyploids
Tag sequences can be imported to tagdigger_interactive.py and tag_manager.py in any of seven different formats. Four of these are the direct output of other single nucleotide polymorphism (SNP)-calling software: FASTA files from TASSELUNEAK, the sequence alignment/map (SAM) file used for generating markers in TASSEL-GBS, tab-delimited text output of the cstacks program in Stacks, and the.alleles file output from pyRAD
For consistency with benchmarking tests performed on other software, 96 Gb of RAM was available to the search algorithm, it used less than 1 Gb

Summary

Results

We created the software TagDigger, which includes three programs for analyzing GBS and RAD-seq data. The first script, tagdigger_interactive.py, rapidly extracts read counts and genotypes from FASTQ files using usersupplied sets of barcodes and tags. Input and output is in CSV format so that it can be opened by spreadsheet software. Tag sequences can be imported from the Stacks, TASSEL-GBSv2, TASSEL-UNEAK, or pyRAD pipelines, and a separate file can be imported listing the names of markers to retain. A second script, tag_manager.py, consolidates marker names and sequences across multiple projects. A third script, barcode_splitter.py, assists with preparing FASTQ data for deposit in a public archive by splitting FASTQ files by barcode and generating MD5 checksums for the resulting files

Conclusions

Background

C None

Results and discussion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Source code for biology and medicine	Publication Date: Jul 11, 2016
Citations: 31	License type: cc-by

R Discovery Prime

R Discovery Prime

TagDigger: user-friendly extraction of read counts from GBS and RAD-seq data.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Source code for biology and medicine

Lead the way for us

Similar Papers

RAD-seq data reveals robust phylogeny and morphological evolutionary history of Rhododendron
Yuanting Shen ... Yongpeng Ma
Horticultural Plant Journal | VOL. 10
Yuanting Shen, et. al.Yuanting Shen ... Yongpeng Ma
15 Sep 2023
Horticultural Plant Journal | VOL. 10

Opportunities for unlocking the potential of genomics for African trees.
Barnabas H Daru ... Dave K Berger
The New phytologist | VOL. 210
Barnabas H Daru, et. al.Barnabas H Daru ... Dave K Berger
22 Dec 2015
The New phytologist | VOL. 210

Population Genomic Analysis of Model and Nonmodel Organisms Using Sequenced RAD Tags
Paul A Hohenlohe ... Julian Catchen
-
Paul A Hohenlohe, et. al.Paul A Hohenlohe ... Julian Catchen
01 Jan 2012
01 Jan 2012

A bioinformatic pipeline for identifying informative SNP panels for parentage assignment from RADseq data.
Kimberly R Andrews ... Colby Gardner
Molecular Ecology Resources | VOL. 18
Kimberly R Andrews, et. al.Kimberly R Andrews ... Colby Gardner
09 Jul 2018
Molecular Ecology Resources | VOL. 18

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

TagDigger: user-friendly extraction of read counts from GBS and RAD-seq data.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Source code for biology and medicine