A case-based meta-learning algorithm boosts the performance of structure-based virtual screening

Xi Yun,Lei Xie,Susan L Epstein,Weiwei Han

doi:10.1109/bibm.2013.6732464

Abstract

Virtual screening based on protein-ligand docking is widely applied at the early stage of drug discovery. Scoring functions from a diverse set of existing protein-ligand docking tools, however, often poorly distinguish bioactive compounds from inactive ones. As a result, considerable effort has been devoted to the combination of multiple scoring functions for more reliable evaluation. State-of-the-art consensus scoring or ensemble learning methods assume each scoring function performs uniformly for all cases. Case-based meta-learning (CBML), the method we have developed, is fundamentally different. It identifies the best predictor for a specific new case based on its similarity to old cases and uses that method to predict rather than average the performance of all predictors. Our large-scale benchmark studies clearly indicate that CBML outperforms consensus-based scoring and significantly improves the performance of structure-based virtual screening. The CBML paradigm can be extended to other applications in bioinformatics and chemoinformatics for robust and reliable predictive modeling.

Full Text