Chapter 1 - Random forest method for predicting protein ligand–binding residues

Peng Chen,Bing Wang,Jun Zhang,Xin Gao

doi:10.1016/b978-0-12-824386-2.00003-1

Abstract

Protein–ligand binding is important for some proteins to perform their functions. Protein–ligand binding sites are the residues of proteins that physically bind to ligands. Despite of the recent advance in computational prediction for protein–ligand binding sites, the state-of-the-art methods searched for similar, known structures of the query and predicted the binding sites based on the solved structures. However, such structural information is not commonly available. This chapter proposes a sequence-based approach to identify protein–ligand binding residues. We proposed a combination to reduce the effects of different sliding residue windows in the process of encoding input vectors. Moreover, due to the highly imbalanced samples between the ligand-binding sites and non-ligand-binding sites, we constructed several balanced data sets, for each of which a random forest (RF)-based classifier was trained. The ensemble of these RF classifiers formed a sequence-based protein–ligand binding site predictor. Experimental results on CASP9 and CASP8 targets demonstrated that our method compared favorably with the state of the art.

Full Text