Abstract

Cytokine proteins, which form a complex cytokine regulatory network, participate in a variety of important physiological functions of the human body. Identification of cytokine proteins is very important and has attracted the attention of many researchers. In this paper, we propose a MRMD-cosine model based on the PseKRAAC features to identify the cytokine proteins. First, the PseKRAAC feature extraction method is used to extract four kinds of feature sets from the cytokine proteins, named type1 g-gap, type1 lambda, type2 g-gap and type2 lambda feature sets. Then the MRMD algorithm is used to remove the redundant features from the feature sets. Three kinds of metrics are used by the MRMD algorithm to measure the redundancy of a feature set, which are the Euclidean distance, Cosine similarity and Tanimoto coefficient. Bagging and random forest algorithms are used to construct the classification models based on the compressed feature set. The experimental results show that the MRMD-cosine model based on the type1 lambda feature set constructed by the random forest algorithm can achieve the best performance among all models. Finally, we compare the performance of the MRMD-cosine model with another state-of-art model, named greedy based feature compression model based on the CNT features. It shows that the MRMD-cosine model uses only 15% features of the greedy based model to achieve a better accuracy.

Highlights

  • Cytokine is a kind of low molecular weight soluble protein induced by immunogen, mitogen or other stimulants

  • In paper [36], a greedy based feature compression model based on the CNT feature set is proposed to classify the cytokine proteins

  • MRMD utilizes three kinds of metrics to evaluate the redundancy of features in the feature set, which are the Euclidean distance, Cosine similarity and Tanimoto coefficient

Read more

Summary

INTRODUCTION

Cytokine is a kind of low molecular weight soluble protein induced by immunogen, mitogen or other stimulants. In paper [36], a greedy based feature compression model based on the CNT feature set is proposed to classify the cytokine proteins. We utilize the PseKRAAC methods [54] to extract features from the cytokine proteins to construct the classification models. Two machine learning algorithms, bagging and random forest, are used to construct the classification models to identify the cytokines based on the compressed. A MRMD-Cosine model based on the type[1] lambda feature set constructed by the random forest algorithm achieves the best performance among all models. We compare the performance of the MRMD-cosine model with the greedy based model based on the CNT feature set. (1) A MRMD-cosine model based on the type[1] lambda feature set constructed by the random forest algorithm is proposed to classify the cytokine proteins.

METHODS
MRMD FEATURE COMPRESSION ALGORITHM
Findings
CONCLUSION

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.