A dictionary to identify small molecules and drugs in free text

Kristina M Hettne,Martijn J Schuemie,Jos Kleinjans,Bob J A Schijvenaars,Jan A Kors,Rob H Stierum,Peter J M Hendriksen,Erik M Van Mulligen

doi:10.1093/bioinformatics/btp535

Kristina M Hettne, Martijn J Schuemie + Show 6 more

Open Access

https://doi.org/10.1093/bioinformatics/btp535

Copy DOI

Journal: Bioinformatics	Publication Date: Sep 16, 2009
Citations: 150	License type: other-oa

Affiliation: Maastricht University, Erasmus MC

Abstract

From the scientific community, a lot of effort has been spent on the correct identification of gene and protein names in text, while less effort has been spent on the correct identification of chemical names. Dictionary-based term identification has the power to recognize the diverse representation of chemical information in the literature and map the chemicals to their database identifiers. We developed a dictionary for the identification of small molecules and drugs in text, combining information from UMLS, MeSH, ChEBI, DrugBank, KEGG, HMDB and ChemIDplus. Rule-based term filtering, manual check of highly frequent terms and disambiguation rules were applied. We tested the combined dictionary and the dictionaries derived from the individual resources on an annotated corpus, and conclude the following: (i) each of the different processing steps increase precision with a minor loss of recall; (ii) the overall performance of the combined dictionary is acceptable (precision 0.67, recall 0.40 (0.80 for trivial names); (iii) the combined dictionary performed better than the dictionary in the chemical recognizer OSCAR3; (iv) the performance of a dictionary based on ChemIDplus alone is comparable to the performance of the combined dictionary. The combined dictionary is freely available as an XML file in Simple Knowledge Organization System format on the web site http://www.biosemantics.org/chemlist.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

A dictionary to identify small molecules and drugs in free text

Abstract

Talk to us

Similar Papers

More From: Bioinformatics

Lead the way for us

Similar Papers

Automatic vs. manual curation of a multi-source chemical dictionary: the impact on text mining
Kristina M Hettne ... Jan A Kors
Journal of Cheminformatics | VOL. 2
Kristina M Hettne, et. al.Kristina M Hettne ... Jan A Kors
23 Mar 2010
Journal of Cheminformatics | VOL. 2

The ProteoRed MIAPE web toolkit: A User-friendly Framework to Connect and Share Proteomics Standards
J Alberto Medina-Aunon ... Juan P Albar
Molecular & Cellular Proteomics | VOL. 10
J Alberto Medina-Aunon, et. al.J Alberto Medina-Aunon ... Juan P Albar
19 Jun 2011
Molecular & Cellular Proteomics | VOL. 10

SciMiner: web-based literature mining tool for target identification and functional enrichment analysis
Junguk Hur ... David J States
Bioinformatics | VOL. 25
Junguk Hur, et. al.Junguk Hur ... David J States
02 Feb 2009
Bioinformatics | VOL. 25

Endocrine-related resources from the National Institutes of Health.
-
Endocrinology | VOL. 142
--
01 Dec 2001
Endocrinology | VOL. 142

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

A dictionary to identify small molecules and drugs in free text

Abstract

Talk to us

Similar Papers

More From: Bioinformatics