Machine Learning-based techniques are emerging as state-of-the-art methods in chemoinformatics to selectively, effectively and speedily identify biologically relevant molecules from large databases. So far, a multitude of such techniques have been proposed, but unfortunately due to their sparse availability, and the dependency on high-end computational literacy, their wider adaptation faces challenges, at least in the context of G-Protein Coupled Receptors (GPCRs)-associated chemosensory research. Here, we report Machine-OlF-Action (MOA), a user-friendly, open-source computational framework, that utilizes user-supplied SMILES (simplified molecular input line entry system) of the chemicals, along with their activation status, to synthesize classification models. MOA integrates a number of popular chemical databases collectively harboring approximately 103 million chemical moieties. MOA also facilitates customized screening of user-supplied chemical datasets. A key feature of MOA is its ability to embed molecules based on the similarity of their local neighborhood, by utilizing a state-of-the-art model interpretability framework LIME. We demonstrate the utility of MOA in identifying previously unreported agonists for human and mouse olfactory receptors OR1A1 and MOR174-9 by leveraging the chemical features of their known agonists and non-agonists. In summary, here we develop an ML-powered software playground for performing supervisory learning tasks involving chemical compounds. MOA is available for Windows, Mac and Linux operating systems. It's accessible at (https://ahuja-lab.in/). Source code, user manual, step-by-step guide and support is available at GitHub (https://github.com/the-ahuja-lab/Machine-Olf-Action). For results, reproducibility and hyperparameters, refer to Supplementary Notes. Supplementary data are available at Bioinformatics online.
Read full abstract