PTPD: predicting therapeutic peptides by deep learning and word2vec

Chuanyan Wu,Yusen Zhang,Yang De Marinis,Rui Gao

doi:10.1186/s12859-019-3006-z

Chuanyan Wu, Yusen Zhang + Show 2 more

Open Access

PDF Available

https://doi.org/10.1186/s12859-019-3006-z

Copy DOI

Export

Save

Cite

Journal: BMC Bioinformatics	Publication Date: Sep 6, 2019
Citations: 80	License type: open-access

Affiliation: Lund University, Shandong University

Abstract
Highlights/Summary
Full-Text PDF
Similar Papers

Abstract

Listen

*Background In the search for therapeutic peptides for disease treatments, many efforts have been made to identify various functional peptides from large numbers of peptide sequence databases. In this paper, we propose an effective computational model that uses deep learning and word2vec to predict therapeutic peptides (PTPD).*Results Representation vectors of all k-mers were obtained through word2vec based on k-mer co-existence information. The original peptide sequences were then divided into k-mers using the windowing method. The peptide sequences were mapped to the input layer by the embedding vector obtained by word2vec. Three types of filters in the convolutional layers, as well as dropout and max-pooling operations, were applied to construct feature maps. These feature maps were concatenated into a fully connected dense layer, and rectified linear units (ReLU) and dropout operations were included to avoid over-fitting of PTPD. The classification probabilities were generated by a sigmoid function. PTPD was then validated using two datasets: an independent anticancer peptide dataset and a virulent protein dataset, on which it achieved accuracies of 96% and 94%, respectively.*Conclusions PTPD identified novel therapeutic peptides efficiently, and it is suitable for application as a useful tool in therapeutic peptide design.

Highlights

In the search for therapeutic peptides for disease treatments, many efforts have been made to identify various functional peptides from large numbers of peptide sequence databases
predict therapeutic peptides (PTPD) identified novel therapeutic peptides efficiently, and it is suitable for application as a useful tool in therapeutic peptide design
Machine learning algorithms for predicting virulent proteins have been reported that apply support vector machine (SVM)-based models based on Amino acid composition (AAC) and dipeptide component (DPC) [16], an ensemble of SVMbased models trained with features extracted directly from amino acid sequences [17], a bi-layer cascade SVM model [18], and a model based on an SVM and a variant of input decimated ensembles and their random subspace [19]

Summary

Introduction

In the search for therapeutic peptides for disease treatments, many efforts have been made to identify various functional peptides from large numbers of peptide sequence databases. We propose an effective computational model that uses deep learning and word2vec to predict therapeutic peptides (PTPD). Novel computational models based on machine learning have been applied to identify virulent proteins in infection pathophysiology. Machine learning algorithms for predicting virulent proteins have been reported that apply SVM-based models based on AAC and DPC [16], an ensemble of SVMbased models trained with features extracted directly from amino acid sequences [17], a bi-layer cascade SVM model [18], and a model based on an SVM and a variant of input decimated ensembles and their random subspace [19]. A computational tool based on the q-Wiener graph indices was proposed to effectively predict virulent proteins [10]. Despite substantial progress, identifying specific peptides from massive protein databases remains challenging

Methods

Results

Discussion

Conclusion