Learning chemistry: exploring the suitability of machine learning for the task of structure-based chemical ontology classification

Janna Hastings,Till Mossakowski,Fabian Neuhaus,Adel Memariani,Martin Glauer

doi:10.1186/s13321-021-00500-8

Janna Hastings, Till Mossakowski + Show 3 more

Open Access

https://doi.org/10.1186/s13321-021-00500-8

Copy DOI

Abstract

Chemical data is increasingly openly available in databases such as PubChem, which contains approximately 110 million compound entries as of February 2021. With the availability of data at such scale, the burden has shifted to organisation, analysis and interpretation. Chemical ontologies provide structured classifications of chemical entities that can be used for navigation and filtering of the large chemical space. ChEBI is a prominent example of a chemical ontology, widely used in life science contexts. However, ChEBI is manually maintained and as such cannot easily scale to the full scope of public chemical data. There is a need for tools that are able to automatically classify chemical data into chemical ontologies, which can be framed as a hierarchical multi-class classification problem. In this paper we evaluate machine learning approaches for this task, comparing different learning frameworks including logistic regression, decision trees and long short-term memory artificial neural networks, and different encoding approaches for the chemical structures, including cheminformatics fingerprints and character-based encoding from chemical line notation representations. We find that classical learning approaches such as logistic regression perform well with sets of relatively specific, disjoint chemical classes, while the neural network is able to handle larger sets of overlapping classes but needs more examples per class to learn from, and is not able to make a class prediction for every molecule. Future work will explore hybrid and ensemble approaches, as well as alternative network architectures including neuro-symbolic approaches.

Highlights

In the last decades, significant progress has been made within the life sciences in bringing chemical data into the public domain in open databases such as PubChem [1]
We evaluate several machine learning approaches for their applicability to the problem of classifying novel molecular entities into the ChEBI chemical ontology [11] based on their chemical structures
There are challenges with the transformation of ChEBI into a form that can be used for this task, which we discuss below. We evaluate both classical machine learning approaches, which learn to predict a single “best match” class for an input molecule, and artificial neural networks, which learn to predict a likelihood of class membership for every class that the network knows about, given an input molecule

Summary

Introduction

Significant progress has been made within the life sciences in bringing chemical data into the public domain in open databases such as PubChem [1] These resources are massive in scale: as of February 2021, PubChem contains approximately 110 million structurally distinct entries. ChEBI has been widely adopted throughout the life sciences, and can be considered the “gold standard” chemical ontology in the public domain It has been applied for multiple purposes, including in support of the bioinformatics and systems biology of metabolism [14], biological data interpretation [15, 16], natural language processing [17], and as a chemistry component for the semantic web It hinders applications in the context of investigations into large-scale molecular systems such as whole-genome metabolism, for which it is important that the knowledge base be as complete as possible [21]

Objectives

Methods

Conclusion

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Journal of Cheminformatics	Publication Date: Mar 16, 2021
Citations: 19	License type: open-access

R Discovery Prime

R Discovery Prime

Learning chemistry: exploring the suitability of machine learning for the task of structure-based chemical ontology classification

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Journal of Cheminformatics

Lead the way for us

Similar Papers

Navigating with chemometrics and machine learning in chemistry.
Payal B Joshi
Artificial intelligence review | VOL. 56
Payal B JoshiPayal B Joshi
24 Jan 2023
Artificial intelligence review | VOL. 56

Chemical ontologies: what are they, what are they for and what are the challenges
Janna Hastings ... Duncan Hull
Journal of Cheminformatics | VOL. 3
Janna Hastings, et. al.Janna Hastings ... Duncan Hull
19 Apr 2011
Journal of Cheminformatics | VOL. 3

How Long short-term memory artificial neural network, synthetic data, and fine-tuning improve the classification of raw EEG data
Albert Nasybullin ... Semen Kurkin
-
Albert Nasybullin, et. al.Albert Nasybullin ... Semen Kurkin
14 Sep 2022
14 Sep 2022

Fault Detection and Identification with Kernel Principal Component Analysis and Long Short-Term Memory Artificial Neural Network Combined Method
Nahid Jafari ... António M. Lopes
Axioms | VOL. 12
Nahid Jafari, et. al.Nahid Jafari ... António M. Lopes
12 Jun 2023
Axioms | VOL. 12

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Learning chemistry: exploring the suitability of machine learning for the task of structure-based chemical ontology classification

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Journal of Cheminformatics