Scalable Prediction of Compound-protein Interaction on Compressed Molecular Fingerprints.

Yasuo Tabei

doi:10.1002/minf.201900130

Abstract

Prediction of compound-protein interactions with fingerprints has recently become challenging in recent pharmaceutical science for an efficient drug discovery. We review two scalable methods for predicting drug-protein interactions on fingerprints. Especially, we introduce two techniques of learning statistical models using lossless and lossy data compressions. The first one is a method using a trie representation of fingerprints which enables us to learn predictive models on the compressed format. The second one is a method using lossy data compression called feature maps (FMs). Recently, quite a few numbers of FMs for kernel approximations have been proposed and minwise hashing, one method of this kind. has been applied to predictions of compound-protein interactions and shows an effectiveness of the method. Overall, we show learning statistical models on the compressed format is effective for predicting compound-protein interactions on a large-scale.

Full Text