Abstract

Testing potential drug treatments in animal disease models is a decisive step of all preclinical drug discovery programs. Yet, despite the importance of such experiments for translational medicine, there have been relatively few efforts to comprehensively and consistently analyze the data produced by in vivo bioassays. This is partly due to their complexity and lack of accepted reporting standards—publicly available animal screening data are only accessible in unstructured free-text format, which hinders computational analysis. In this study, we use text mining to extract information from the descriptions of over 100,000 drug screening-related assays in rats and mice. We retrieve our dataset from ChEMBL—an open-source literature-based database focused on preclinical drug discovery. We show that in vivo assay descriptions can be effectively mined for relevant information, including experimental factors that might influence the outcome and reproducibility of animal research: genetic strains, experimental treatments, and phenotypic readouts used in the experiments. We further systematize extracted information using unsupervised language model (Word2Vec), which learns semantic similarities between terms and phrases, allowing identification of related animal models and classification of entire assay descriptions. In addition, we show that random forest models trained on features generated by Word2Vec can predict the class of drugs tested in different in vivo assays with high accuracy. Finally, we combine information mined from text with curated annotations stored in ChEMBL to investigate the patterns of usage of different animal models across a range of experiments, drug classes, and disease areas.

Highlights

  • Testing potential therapeutic compounds in animal disease and safety models is a crucial part of preclinical drug discovery [1]

  • Before exposing human populations to potential drug treatments, novel compounds are tested in living non-human animals—arguably the most physiologically relevant model system known to drug discovery

  • Our results show that text mining and machine learning have a potential to significantly contribute to the ongoing debate on the interpretation and reproducibility of animal model research through enabling access, integration, and large-scale analysis of in vivo drug screening data

Read more

Summary

Introduction

Testing potential therapeutic compounds in animal disease and safety models is a crucial part of preclinical drug discovery [1]. An in vivo assay, depending on the animal species, allows a potentially far more realistic and predictive measure of a compound’s effect, and can capture the complexity of target engagement, metabolism, and pharmacokinetics required in the final therapeutic drug. A proof of efficacy and safety in animals is usually an essential requirement by regulatory agencies before progressing a compound into human studies [1, 4]. Drug efficacy tests are carried in animal models that mimic some aspects of human pathology. Based on how the disease state is created, animal models can be generally classified into three main groups [5]:

Objectives
Methods
Results
Discussion
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.