Abstract

Bioactive compounds are often used as initial substances for many therapeutic agents. In recent years, both theoretical and practical innovations in hardware-assisted and fast-evolving machine learning (ML) have made it possible to identify desired bioactive compounds in chemical spaces, such as those in natural products (NPs). This review introduces how machine learning approaches can be used for the identification and evaluation of bioactive compounds. It also provides an overview of recent research trends in machine learning-based prediction and the evaluation of bioactive compounds by listing real-world examples along with various input data. In addition, several ML-based approaches to identify specific bioactive compounds for cardiovascular and metabolic diseases are described. Overall, these approaches are important for the discovery of novel bioactive compounds and provide new insights into the machine learning basis for various traditional applications of bioactive compound-related research.

Highlights

  • Bioactive compounds are chemicals that exist in trace amounts in natural products (NPs) from plants and animals

  • Many approved drugs are structurally similar to bioactive compounds [8], providing a clue that finding bioactive compound analogs may be the right strategy for novel drug development

  • machine learning (ML)-based approaches typically achieve high performance for classification or regression tasks, there are still some limitations to be widely applied in NP-related research

Read more

Summary

Introduction

Bioactive compounds are chemicals that exist in trace amounts in natural products (NPs) from plants and animals. Another obstacle to applying machine learning technology to gene expression data is that individual gene expression data have been processed in different analysis pipelines, so unknown bias (noise) that is difficult to identify, such as batch effects, different sequencing devices, and/or various experimental conditions, is inherent to the data To minimize this bias and provide vast transcriptome resources in a user-friendly manner, a recent study constructed a transcriptome database called ARCHS4 (https://maayanlab.cloud/archs4/ (accessed on 10 December 2021)), comprising more than 200,000 human and mouse transcripts processed through a uniform analysis pipeline [41]. Advances in cheminformatics, bioinformatics, and a variety of publicly accessible databases have emerged rapidly over the past decade, accelerating the process of interconnecting chemistry, biology, and drug development This system will be used to expand our limited understanding of chemicals and build promising ML-based models that generate novel bioactive-like compounds, with high efficacy and low toxicity, against various diseases, including cardiovascular and metabolic diseases

Chemical Space Where Unidentified Bioactive Compounds Exist
How a Machine Learns from Data and Creates a Model for a Task Using Machine
Conclusions
Findings
Future Perspective
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call