A method for identifying moonlighting proteins based on linear discriminant analysis and bagging-SVM.

Yu Chen,Jifeng Guo,Sai Li

doi:10.3389/fgene.2022.963349

Yu Chen, Jifeng Guo + Show 1 more

Open Access

https://doi.org/10.3389/fgene.2022.963349

Copy DOI

Journal: Frontiers in genetics	Publication Date: Aug 15, 2022
Citations: 3	License type: CC BY 4.0

Affiliation: Northeast Forestry University

Abstract

Moonlighting proteins have at least two independent functions and are widely found in animals, plants and microorganisms. Moonlighting proteins play important roles in signal transduction, cell growth and movement, tumor inhibition, DNA synthesis and repair, and metabolism of biological macromolecules. Moonlighting proteins are difficult to find through biological experiments, so many researchers identify moonlighting proteins through bioinformatics methods, but their accuracies are relatively low. Therefore, we propose a new method. In this study, we select SVMProt-188D as the feature input, and apply a model combining linear discriminant analysis and basic classifiers in machine learning to study moonlighting proteins, and perform bagging ensemble on the best-performing support vector machine. They are identified accurately and efficiently. The model achieves an accuracy of 93.26% and an F-sorce of 0.946 on the MPFit dataset, which is better than the existing MEL-MP model. Meanwhile, it also achieves good results on the other two moonlighting protein datasets.

Full Text