Abstract

Venom-gland transcriptomics is a key tool in the study of the evolution, ecology, function, and pharmacology of animal venoms. In particular, gene-expression variation and coding sequences gained through transcriptomics provide key information for explaining functional venom variation over both ecological and evolutionary timescales. The accuracy and usefulness of inferences made through transcriptomics, however, is limited by the accuracy of the transcriptome assembly, which is a bioinformatic problem with several possible solutions. Several methods have been employed to assemble venom-gland transcriptomes, with the Trinity assembler being the most commonly applied among them. Although previous evidence of variation in performance among assembly software exists, particularly regarding recovery of difficult-to-assemble multigene families such as snake venom metalloproteinases, much work to date still employs a single assembly method. We evaluated the performance of several commonly used de novo assembly methods for the recovery of both nontoxin transcripts and complete, high-quality venom-gene transcripts across eleven snake and four scorpion transcriptomes. We varied k-mer sizes used by some assemblers to evaluate the impact of k-mer length on transcript recovery. We showed that the recovery of nontoxin transcripts and toxin transcripts is best accomplished through different assembly software, with SDT at smaller k-mer lengths and Trinity being best for nontoxin recovery and a combination of SeqMan NGen and a seed-and-extend approach implemented in Extender as the best means of recovering a complete set of toxin transcripts. In particular, Extender was the only means tested capable of assembling multiple isoforms of the diverse snake venom metalloproteinase family, while traditional approaches such as Trinity recovered at most one metalloproteinase transcript. Our work demonstrated that traditional metrics of assembly performance are not predictive of performance in the recovery of complete and high quality toxin genes. Instead, effective venom-gland transcriptomic studies should combine and quality-filter the results of several assemblers with varying algorithmic strategies.

Highlights

  • The traits mediating interactions between species have special ecological and evolutionary significance, making them the foci of studies of adaptation, innovation, and the mode and tempo of diversification [1,2,3]

  • Key Contribution: We found extensive variation in the number and identity of toxin transcripts recovered by different de novo transcriptome assembly software, regarding recovery of transcripts from large toxin gene families

  • This result held true for all snake and scorpion individuals, where snake transcriptomes assembled with SDT_k31 yielded an average of 2033 complete and single copy nontoxin loci out of 3950 reference loci, and scorpion transcriptomes yielded 823 out of 1066 reference loci

Read more

Summary

Introduction

The traits mediating interactions between species have special ecological and evolutionary significance, making them the foci of studies of adaptation, innovation, and the mode and tempo of diversification [1,2,3]. Uncovering the genetic basis of variability in these traits is often difficult, animal venoms, proteinaceous secretions from specialized glands that are directly injected into prey or enemies, present a genetically tractable model to map the progression from genotype to phenotype [4,5]. Venom compositional differences in many venomous taxa are closely linked to variable gene expression [5,8], which can be quantified by RNA-seq-based transcriptomics. Transcriptomes provide sequence data valuable for evolutionary rate assessments, gene tree construction, genome annotation, and improving the quality and completeness of proteomic characterizations of the venom itself [9,10]. The power of transcriptomics to provide these insights is impacted by the completeness and quality of the transcriptome assembly

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call