Venomix: a simple bioinformatic pipeline for identifying and characterizing toxin gene candidates from transcriptomic data.

Jason Macrander,Adam M Reitzel,Marymegan Daly,Jyothirmayi Panda,Daniel Janies

doi:10.7717/peerj.5361

Abstract

The advent of next-generation sequencing has resulted in transcriptome-based approaches to investigate functionally significant biological components in a variety of non-model organism. This has resulted in the area of “venomics”: a rapidly growing field using combined transcriptomic and proteomic datasets to characterize toxin diversity in a variety of venomous taxa. Ultimately, the transcriptomic portion of these analyses follows very similar pathways after transcriptome assembly often including candidate toxin identification using BLAST, expression level screening, protein sequence alignment, gene tree reconstruction, and characterization of potential toxin function. Here we describe the Python package Venomix, which streamlines these processes using common bioinformatic tools along with ToxProt, a publicly available annotated database comprised of characterized venom proteins. In this study, we use the Venomix pipeline to characterize candidate venom diversity in four phylogenetically distinct organisms, a cone snail (Conidae; Conus sponsalis), a snake (Viperidae; Echis coloratus), an ant (Formicidae; Tetramorium bicarinatum), and a scorpion (Scorpionidae; Urodacus yaschenkoi). Data on these organisms were sampled from public databases, with each original analysis using different approaches for transcriptome assembly, toxin identification, or gene expression quantification. Venomix recovered numerically more candidate toxin transcripts for three of the four transcriptomes than the original analyses and identified new toxin candidates. In summary, we show that the Venomix package is a useful tool to identify and characterize the diversity of toxin-like transcripts derived from transcriptomic datasets. Venomix is available at: https://bitbucket.org/JasonMacrander/Venomix/.

Highlights

Throughout the animal kingdom, venom has evolved independently multiple times to be used in prey capture, predatory defense, and intraspecific competition (Casewell et al, 2013)
Transcriptomes reassembled in Trinity (Grabherr et al, 2011) produced similar de novo assembly outputs when compared to the original studies (Table S1), with the only notable difference being in the number of transcripts for C. sponsalis, which may be due to repetitiveness and sequence complexity encountered during their initial assemblies (Phuong, Mahardika & Alfaro, 2016)
The transcriptome for T. bicarinatum was originally done using Velvet/Oases (Li & Durbin, 2009); we compared this to our Trinity assembly because of its ease of use (Sanders et al, 2018) and frequency in the venom literature (Macrander, Broe & Daly, 2015), in addition to a lower redundancy and chimera rate (Yang & Smith, 2013)

Summary

Introduction

Throughout the animal kingdom, venom has evolved independently multiple times to be used in prey capture, predatory defense, and intraspecific competition (Casewell et al, 2013). For other more poorly studied taxonomic lineages, similar techniques are being used to evaluate venom diversity using bioinformatic pipelines for a particular species or taxonomic group (Tan, Khan & Brusic, 2003; Reumont et al, 2014; Macrander, Brugler & Daly, 2015; Kaas & Craik, 2015; Prashanth & Lewis, 2015). These take similar approaches to study diverse venoms across animal lineages, a streamlined systematic pipeline does not exist for rapid identification of candidate toxin genes from transcriptomic datasets regardless of their taxonomic lineage

Methods

Results

Conclusion