Structural protein fold recognition based on secondary structure and evolutionary information using machine learning algorithms

Xinyi Qin,Min Liu,Lu Zhang,Guangzhong Liu

doi:10.1016/j.compbiolchem.2021.107456

Abstract

Understanding the function of protein is conducive to research in advanced fields such as gene therapy of diseases, the development and design of new drugs, etc. The prerequisite for understanding the function of a protein is to determine its tertiary structure. The realization of protein structure classification is indispensable for this problem and fold recognition is a commonly used method of protein structure classification. Protein sequences of 40% identity in the ASTRAL protein classification database are used for fold recognition research in current work to predict 27 folding types which mostly belong to four protein structural classes: α, β, α/β and α + β. We extract features from primary structure of protein using methods covering DSSP, PSSM and HMM which are based on secondary structure and evolutionary information to convert protein sequences into feature vectors that can be recognized by machine learning algorithm and utilize the combination of LightGBM feature selection algorithm and incremental feature selection method (IFS) to find the optimal classifiers respectively constructed by machine learning algorithms on the basis of tree structure including Random Forest, XGBoost and LightGBM. Bayesian optimization method is used for hyper-parameter adjustment of machine learning algorithms to make the accuracy of fold recognition reach as high as 93.45% at last. The result obtained by the model we propose is outstanding in the study of protein fold recognition.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Structural protein fold recognition based on secondary structure and evolutionary information using machine learning algorithms

Abstract

Talk to us

Similar Papers

More From: Computational Biology and Chemistry

Lead the way for us

Journal: Computational Biology and Chemistry	Publication Date: Feb 12, 2021
Citations: 9

Similar Papers

What are the baselines for protein fold recognition?
Liam J Mcguffin ... Kevin Bryson
Bioinformatics | VOL. 17
Liam J Mcguffin, et. al.Liam J Mcguffin ... Kevin Bryson
01 Jan 2001
Bioinformatics | VOL. 17

Pocketome via Comprehensive Identification and Classification of Ligand Binding Envelopes
Jianghong An ... Ruben Abagyan
Molecular & Cellular Proteomics | VOL. 4
Jianghong An, et. al.Jianghong An ... Ruben Abagyan
01 Jun 2005
Molecular & Cellular Proteomics | VOL. 4

ASFold-DNN: Protein Fold Recognition Based on Evolutionary Features With Variable Parameters Using Full Connected Neural Network.
Xinyi Qin ... Guangzhong Liu
IEEE/ACM transactions on computational biology and bioinformatics | VOL. 19
Xinyi Qin, et. al.Xinyi Qin ... Guangzhong Liu
01 Sep 2022
IEEE/ACM transactions on computational biology and bioinformatics | VOL. 19

The recognition of multi-class protein folds by adding average chemical shifts of secondary structure elements
Zhenxing Feng ... Muhammad Aqeel Ashraf
Saudi Journal of Biological Sciences | VOL. 23
Zhenxing Feng, et. al.Zhenxing Feng ... Muhammad Aqeel Ashraf
11 Dec 2015
Saudi Journal of Biological Sciences | VOL. 23

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Structural protein fold recognition based on secondary structure and evolutionary information using machine learning algorithms

Abstract

Talk to us

Similar Papers

More From: Computational Biology and Chemistry