Feature Importance in the Quality of Protein Templates

Muhamed Adilovic ,Altijana Hromic-Jahjefendic

doi:10.21533/pen.v9i2.1830.g817

Abstract

Proteins are in the focus of research due to their importance as biological catalysts in various cellular processes and diseases. Since the experimental study of proteins is time-consuming and expensive, in silico prediction and analysis of proteins is common. Template-based prediction is the most reliable, which is why the aim of this study is to analyze how important are the primary features of proteins for their quality score. Statistical analysis shows that protein models with a resolution lower than 3 A or R value lower than 0.25 have higher quality scores when compared individually to their counterparts. Machine learning algorithm random forest analysis also shows resolution to have the highest importance, while other features have lower but moderate importance scores. The exception is the presence of ligand in protein models, which does not have an effect on the global protein quality scores, both through statistical and machine learning analyses.

Full Text