Improving chemical entity recognition through h-index based semantic similarity.

Andre Lamurias,João D Ferreira,Francisco M Couto

doi:10.1186/1758-2946-7-s1-s13

Andre Lamurias, João D Ferreira + Show 1 more

Open Access

https://doi.org/10.1186/1758-2946-7-s1-s13

Copy DOI

Journal: Journal of Cheminformatics	Publication Date: Jan 19, 2015
Citations: 35	License type: CC BY 4.0

Affiliation: University of Lisbon

Abstract

BackgroundOur approach to the BioCreative IV challenge of recognition and classification of drug names (CHEMDNER task) aimed at achieving high levels of precision by applying semantic similarity validation techniques to Chemical Entities of Biological Interest (ChEBI) mappings. Our assumption is that the chemical entities mentioned in the same fragment of text should share some semantic relation. This validation method was further improved by adapting the semantic similarity measure to take into account the h-index of each ancestor. We applied this method in two measures, simUI and simGIC, and validated the results obtained for the competition, comparing each adapted measure to its original version.ResultsFor the competition, we trained a Random Forest classifier that uses various scores provided by our system, including semantic similarity, which improved the F-measure obtained with the Conditional Random Fields classifiers by 4.6%. Using a notion of concept relevance based on the h-index measure, we were able to enhance our validation process so that for a fixed recall, we increased precision by excluding from the results a higher amount of false positives. We plotted precision and recall values for a range of validation thresholds using different similarity measures, obtaining higher precision values for the same recall with the measures based on the h-index.ConclusionsThe semantic similarity measure we introduced was more efficient at validating text mining results from machine learning classifiers than other measures. We improved the results we obtained for the CHEMDNER task by maintaining high precision values while improving the recall and F-measure.

Highlights

Our approach to the BioCreative IV challenge of recognition and classification of drug names (CHEMDNER task) aimed at achieving high levels of precision by applying semantic similarity validation techniques to Chemical Entities of Biological Interest (ChEBI) mappings
The chosen articles were sampled from a list of articles published in 2013 by the top 100 journals of a list of categories related to the chemistry field
There was no limit for the number of words that could refer to a Chemical Entity Mention (CEM) but due to the annotation format, the sequence of words had to be continuous

Summary

Introduction

Our approach to the BioCreative IV challenge of recognition and classification of drug names (CHEMDNER task) aimed at achieving high levels of precision by applying semantic similarity validation techniques to Chemical Entities of Biological Interest (ChEBI) mappings. Our assumption is that the chemical entities mentioned in the same fragment of text should share some semantic relation This validation method was further improved by adapting the semantic similarity measure to take into account the h-index of each ancestor. One of the tasks proposed for the fourth edition of this competition was the chemical compound and drug named entity recognition (CHEMDNER) task It was essentially a NER task for detecting chemical compounds and drugs in MEDLINE documents, in particular those that can be linked to a chemical structure [1]. The task organizers provided a training corpus composed of 10,000 MEDLINE titles and abstracts that were manually annotated by domain experts.

Objectives

Methods

Results

Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Improving chemical entity recognition through h-index based semantic similarity.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Journal of Cheminformatics

Lead the way for us

Similar Papers

Towards automatic classification within the ChEBI ontology
Janna Hastings ... Paula De Matos
Nature Precedings | VOL. 4
Janna Hastings, et. al.Janna Hastings ... Paula De Matos
31 Jul 2009
Nature Precedings | VOL. 4

Chemical Entities of Biological Interest: an update
Paula De Matos ... Christoph Steinbeck
Nucleic Acids Research | VOL. 38
Paula De Matos, et. al.Paula De Matos ... Christoph Steinbeck
23 Oct 2009
Nucleic Acids Research | VOL. 38

ChEBI: a database and ontology for chemical entities of biological interest
K Degtyarenko ... M Ashburner
Nucleic Acids Research | VOL. 36
K Degtyarenko, et. al.K Degtyarenko ... M Ashburner
23 Dec 2007
Nucleic Acids Research | VOL. 36

Epitopes in ChEBI - A Collaboration with the IEDB
Zara Josephs ... Christoph Steinbeck
Nature Precedings | VOL. 5
Zara Josephs, et. al.Zara Josephs ... Christoph Steinbeck
25 Oct 2010
Nature Precedings | VOL. 5

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Improving chemical entity recognition through h-index based semantic similarity.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Journal of Cheminformatics