New Heuristic Methods for Protein Model Quality Assessment via Two-Stage Machine Learning and Hierarchical Ensemble

Junlin Wang,Yi Shang,Dong Xu,Wenbo Wang

doi:10.1109/cogmi56440.2022.00022

Abstract

Computational protein structure prediction is an important problem in bioinformatics and the ability to accurately evaluating the quality of predicted protein models is of significant interest. In this paper, three new single-model quality assessment (QA) methods, MMQA-1 MMQA-2 and MMQA-HE, are proposed based on two-stage machine learning and hierarchical ensemble techniques. MMQA-1 and MMQA-2 train different machine learning models in two separate stages. They divide the entire feature set into two groups and uses completely different feature sets and training data in each stage to train a predictive model. MMQA-HE is an ensemble method that combines individual models not only at the tree level, but also at the forest level. In CASP14, MMQA-1 ranked No. 2 in terms of average GDT-TS difference. MMQA-2 and MMQA-HE improve MMQA-1 and outperform existing state-of-the-art QA methods across multiple QA performance metrics.

Full Text