Abstract

BackgroundText mining and data integration methods are gaining ground in the field of health sciences due to the exponential growth of bio-medical literature and information stored in biological databases. While such methods mostly try to extract bioentity associations from PubMed, very few of them are dedicated in mining other types of repositories such as chemical databases.ResultsHerein, we apply a text mining approach on the DrugBank database in order to explore drug associations based on the DrugBank “Description”, “Indication”, “Pharmacodynamics” and “Mechanism of Action” text fields. We apply Name Entity Recognition (NER) techniques on these fields to identify chemicals, proteins, genes, pathways, diseases, and we utilize the TextQuest algorithm to find additional biologically significant words. Using a plethora of similarity and partitional clustering techniques, we group the DrugBank records based on their common terms and investigate possible scenarios why these records are clustered together. Different views such as clustered chemicals based on their textual information, tag clouds consisting of Significant Terms along with the terms that were used for clustering are delivered to the user through a user-friendly web interface.ConclusionsDrugQuest is a text mining tool for knowledge discovery: it is designed to cluster DrugBank records based on text attributes in order to find new associations between drugs. The service is freely available at http://bioinformatics.med.uoc.gr/drugquest.

Highlights

  • Text mining and data integration methods are gaining ground in the field of health sciences due to the exponential growth of bio-medical literature and information stored in biological databases

  • We focus on the DrugBank [10,11,12] repository which is a freely available resource that combines detailed information about 7736 drug entries including 1584 FDA-approved small molecule drugs, 158 FDA-approved biotech drugs, 89 nutraceuticals and over 6000 experimental drugs

  • As each DrugBank record consists of various fields, we selected for fields with a high textual information content, more : “Description”, “Indication”, “Pharmacodynamics” and “Mechanism of Action”

Read more

Summary

Introduction

Text mining and data integration methods are gaining ground in the field of health sciences due to the exponential growth of bio-medical literature and information stored in biological databases. While such methods mostly try to extract bioentity associations from PubMed, very few of them are dedicated in mining other types of repositories such as chemical databases. Drug discovery, drug repurposing, organization of chemical compound information in databases, identification of their therapeutic properties and their side effects along with the discovery of novel associations between them still remain active research fields. Chemical Entities of Biological Interest (ChEBI) database [3, 4] is a freely

Objectives
Methods
Results
Discussion
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.