Computational prediction of plasma protein binding of cyclic peptides from small molecule experimental data using sparse modeling techniques

Yasushi Yoshikawa,Keisuke Yanagisawa,Yutaka Akiyama,Takashi Tajimi,Naoki Wakui,Masahito Ohue

doi:10.1186/s12859-018-2529-z

Abstract

BackgroundCyclic peptide-based drug discovery is attracting increasing interest owing to its potential to avoid target protein depletion. In drug discovery, it is important to maintain the biostability of a drug within the proper range. Plasma protein binding (PPB) is the most important index of biostability, and developing a computational method to predict PPB of drug candidate compounds contributes to the acceleration of drug discovery research. PPB prediction of small molecule drug compounds using machine learning has been conducted thus far; however, no study has investigated cyclic peptides because experimental information of cyclic peptides is scarce.ResultsFirst, we adopted sparse modeling and small molecule information to construct a PPB prediction model for cyclic peptides. As cyclic peptide data are limited, applying multidimensional nonlinear models involves concerns regarding overfitting. However, models constructed by sparse modeling can avoid overfitting, offering high generalization performance and interpretability. More than 1000 PPB data of small molecules are available, and we used them to construct a prediction models with two enumeration methods: enumerating lasso solutions (ELS) and forward beam search (FBS). The accuracies of the prediction models constructed by ELS and FBS were equal to or better than those of conventional non-linear models (MAE = 0.167–0.174) on cross-validation of a small molecule compound dataset. Moreover, we showed that the prediction accuracies for cyclic peptides were close to those for small molecule compounds (MAE = 0.194–0.288). Such high accuracy could not be obtained by a simple method of learning from cyclic peptide data directly by lasso regression (MAE = 0.286–0.671) or ridge regression (MAE = 0.244–0.354).ConclusionIn this study, we proposed a machine learning techniques that uses low-dimensional sparse modeling to predict the PPB value of cyclic peptides computationally. The low-dimensional sparse model not only exhibits excellent generalization performance but also improves interpretation of the prediction model. This can provide common an noteworthy knowledge for future cyclic peptide drug discovery studies.

Highlights

Cyclic peptide-based drug discovery is attracting increasing interest owing to its potential to avoid target protein depletion
The models having the smallest root mean squared error (RMSE) of test data from each of the two methods of feature selection were selected as the proposed models because it was assumed that the model of best prediction of unknown data explains the Plasma protein binding (PPB)
Enumeration methods were utilized to predict PPB values of cyclic peptides with the model trained on experimental PPB data of small molecules

Summary

Introduction

Cyclic peptide-based drug discovery is attracting increasing interest owing to its potential to avoid target protein depletion. Plasma protein binding (PPB) is the most important index of biostability, and developing a computational method to predict PPB of drug candidate compounds contributes to the acceleration of drug discovery research. As with monoclonal antibody therapeutics, they can bind to target proteins with high affinities [5] They can interact with flat, shallow, and featureless surfaces of proteins or protein-protein interaction interfaces that are barely targeted by small molecule drugs [6]. They have the potential for oral activity or oral bioavailability, similar to classical small molecule drugs [7,8,9,10,11,12,13]. De novo rational design techniques [20,21,22] and random screening techniques [23, 24] have facilitated development of novel cyclic peptide ligands for difficult targets [25,26,27,28]

Objectives

Methods

Results

Discussion

Conclusion