Abstract
Spider neurotoxins are commonly used as pharmacological tools and are a popular source of novel compounds with therapeutic and agrochemical potential. Since venom peptides are inherently toxic, the host spider must employ strategies to avoid adverse effects prior to venom use. It is partly for this reason that most spider toxins encode a protective proregion that upon enzymatic cleavage is excised from the mature peptide. In order to identify the mature toxin sequence directly from toxin transcripts, without resorting to protein sequencing, the propeptide cleavage site in the toxin precursor must be predicted bioinformatically. We evaluated different machine learning strategies (support vector machines, hidden Markov model and decision tree) and developed an algorithm (SpiderP) for prediction of propeptide cleavage sites in spider toxins. Our strategy uses a support vector machine (SVM) framework that combines both local and global sequence information. Our method is superior or comparable to current tools for prediction of propeptide sequences in spider toxins. Evaluation of the SVM method on an independent test set of known toxin sequences yielded 96% sensitivity and 100% specificity. Furthermore, we sequenced five novel peptides (not used to train the final predictor) from the venom of the Australian tarantula Selenotypus plumipes to test the accuracy of the predictor and found 80% sensitivity and 99.6% 8-mer specificity. Finally, we used the predictor together with homology information to predict and characterize seven groups of novel toxins from the deeply sequenced venom gland transcriptome of S. plumipes, which revealed structural complexity and innovations in the evolution of the toxins. The precursor prediction tool (SpiderP) is freely available on ArachnoServer (http://www.arachnoserver.org/spiderP.html), a web portal to a comprehensive relational database of spider toxins. All training data, test data, and scripts used are available from the SpiderP website.
Highlights
Spiders are the dominant insect killers and the most successful venomous animal on the planet
Most spider-venom peptides are produced as prepropeptide precursors containing N-terminal signal peptide and propeptide regions in addition to the C-terminal region that will become the mature toxin [3]
We evaluated several machine-learning strategies based on their ability to correctly identify propeptide cleavage sites in spider toxins
Summary
Spiders are the dominant insect killers and the most successful venomous animal on the planet. Their evolutionary success is due in large part to the evolution of exceedingly complex venoms [1], which have been predicted to contain as many as 10 million unique peptides [2]. Most spider-venom peptides are produced as prepropeptide precursors containing N-terminal signal peptide and propeptide regions in addition to the C-terminal region that will become the mature toxin [3]. Mature spider-venom peptides tend to be cysteine-rich, and the propeptide is always N-terminal to the cysteine-rich scaffold which restricts the access of proteolytic enzymes [4]. Given the enormous number of spider toxins [2], the experimental approach requiring fractionating whole venom followed by sequencing of amino acids is highly time-consuming and expensive, motivating the development of an in silico strategy
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.