Abstract

BackgroundSmall chemical molecules regulate biological processes at the molecular level. Those molecules are often involved in causing or treating pathological states. Automatically identifying such molecules in biomedical text is difficult due to both, the diverse morphology of chemical names and the alternative types of nomenclature that are simultaneously used to describe them. To address these issues, the last BioCreAtIvE challenge proposed a CHEMDNER task, which is a Named Entity Recognition (NER) challenge that aims at labelling different types of chemical names in biomedical text.MethodsTo address this challenge we tested various approaches to recognizing chemical entities in biomedical documents. These approaches range from linear Conditional Random Fields (CRFs) to a combination of CRFs with regular expression and dictionary matching, followed by a post-processing step to tag those chemical names in a corpus of Medline abstracts. We named our best performing systems CheNER.ResultsWe evaluate the performance of the various approaches using the F-score statistics. Higher F-scores indicate better performance. The highest F-score we obtain in identifying unique chemical entities is 72.88%. The highest F-score we obtain in identifying all chemical entities is 73.07%. We also evaluate the F-Score of combining our system with ChemSpot, and find an increase from 72.88% to 73.83%.ConclusionsCheNER presents a valid alternative for automated annotation of chemical entities in biomedical documents. In addition, CheNER may be used to derive new features to train newer methods for tagging chemical entities. CheNER can be downloaded from http://metres.udl.cat and included in text annotation pipelines.

Highlights

  • Small chemical molecules regulate biological processes at the molecular level

  • Our results show that combining the result list of CheNER and ChemSpot improves the performance of either tool (Tables 4, 6, 7)

  • We find that there are 2643 annotated chemical entities that are only recognized by ChemSpot and 2893 annotated chemical entities that are only recognized by CheNER (Table 8)

Read more

Summary

Introduction

Small chemical molecules regulate biological processes at the molecular level Those molecules are often involved in causing or treating pathological states. Identifying such molecules in biomedical text is difficult due to both, the diverse morphology of chemical names and the alternative types of nomenclature that are simultaneously used to describe them. There are various BioCreAtIvE challenge tracks that focus on identifying various types of biologically relevant entities, such as genes and their functions, diseases, phenotypes, or chemical compounds The importance of these chemical compounds arises from their involvement in regulating biological activity of proteins and genes, and from their potential use to treat pathological states

Methods
Results
Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.