Identification of Phage Viral Proteins With Hybrid Sequence Features.

Xiaoqing Ru,Lihong Li,Chunyu Wang

doi:10.3389/fmicb.2019.00507

Xiaoqing Ru, Lihong Li + Show 1 more

Open Access

https://doi.org/10.3389/fmicb.2019.00507

Copy DOI

Abstract

The uniqueness of bacteriophages plays an important role in bioinformatics research. In real applications, the function of the bacteriophage virion proteins is the main area of interest. Therefore, it is very important to classify bacteriophage virion proteins and non-phage virion proteins accurately. Extracting comprehensive and effective sequence features from proteins plays a vital role in protein classification. In order to more fully represent protein information, this paper is more comprehensive and effective by combining the features extracted by the feature information representation algorithm based on sequence information (CCPA) and the feature representation algorithm based on sequence and structure information. After extracting features, the Max-Relevance-Max-Distance (MRMD) algorithm is used to select the optimal feature set with the strongest correlation between class labels and low redundancy between features. Given the randomness of the samples selected by the random forest classification algorithm and the randomness features for producing each node variable, a random forest method is employed to perform 10-fold cross-validation on the bacteriophage protein classification. The accuracy of this model is as high as 93.5% in the classification of phage proteins in this study. This study also found that, among the eight physicochemical properties considered, the charge property has the greatest impact on the classification of bacteriophage proteins These results indicate that the model discussed in this paper is an important tool in bacteriophage protein research.

Highlights

In the biological world, bacteriophages are ubiquitous, with different genomes and lifestyles
Our results show that, among the eight physicochemical properties of amino acids, the charge property has the greatest influence on the classification of bacteriophage proteins
Based on the feature extraction methods described in section Feature extraction, We extracted a 188-dimensional, 400dimensional feature set based on sequence information, and a 473-dimensional data set based on sequence and secondary structure information representing the entire bacteriophage protein sequence dataset

Summary

INTRODUCTION

Bacteriophages are ubiquitous, with different genomes and lifestyles. Faced with a large volume of data, traditional biological experimental methods could no longer keep up with the post-gene era (Chen W. et al, 2016; Cheng et al, 2019; Mrozek et al, 2016; Hu et al, 2018) For this reason, researchers introduced different machine learning algorithms into bacteriophage classification and prediction research. The random forest algorithm (Breiman, 2001; Yao et al, 2017) combines multiple weak classifiers to produce a final result that has higher accuracy and better generalization performance.

METHODS

EXPERIMENTS

Findings

CONCLUSION

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Frontiers in Microbiology	Publication Date: Mar 26, 2019
Citations: 12	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

Identification of Phage Viral Proteins With Hybrid Sequence Features.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Frontiers in Microbiology

Lead the way for us

Similar Papers

Naïve Bayes classifier with feature selection to identify phage virion proteins.
Peng-Mian Feng ... Wei Chen
Computational and Mathematical Methods in Medicine | VOL. 2013
Peng-Mian Feng, et. al.Peng-Mian Feng ... Wei Chen
01 Jan 2013
Computational and Mathematical Methods in Medicine | VOL. 2013

Identification of Phage Virion Proteins by Using the g-gap Tripeptide Composition
Liangwei Yang ... Lixia Tang
Letters in Organic Chemistry | VOL. 16
Liangwei Yang, et. al.Liangwei Yang ... Lixia Tang
20 Mar 2019
Letters in Organic Chemistry | VOL. 16

Phage Display Informatics
Jian Huang ... Ratmir Derda
Computational and Mathematical Methods in Medicine | VOL. 2013
Jian Huang, et. al.Jian Huang ... Ratmir Derda
01 Jan 2013
Computational and Mathematical Methods in Medicine | VOL. 2013

Identification of Bacteriophage Virion Proteins Using Multinomial Naïve Bayes with g-Gap Feature Tree.
Yanyuan Pan ... Hao Lin
International Journal of Molecular Sciences | VOL. 19
Yanyuan Pan, et. al.Yanyuan Pan ... Hao Lin
15 Jun 2018
International Journal of Molecular Sciences | VOL. 19

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Identification of Phage Viral Proteins With Hybrid Sequence Features.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Frontiers in Microbiology