Abstract

The ever-growing availability of biomedical text sources has resulted in a boost in clinical studies based on their exploitation. Biomedical named-entity recognition (bio-NER) techniques have evolved remarkably in recent years and their application in research is increasingly successful. Still, the disparity of tools and the limited available validation resources are barriers preventing a wider diffusion, especially within clinical practice. We here propose the use of omics data and network analysis as an alternative for the assessment of bio-NER tools. Specifically, our method introduces quality criteria based on edge overlap and community detection. The application of these criteria to four bio-NER solutions yielded comparable results to strategies based on annotated corpora, without suffering from their limitations. Our approach can constitute a guide both for the selection of the best bio-NER tool given a specific task, and for the creation and validation of novel approaches.

Highlights

  • As an alternative to the use of annotated datasets in the development of Biomedical named entity recognition (bio-NER) tools, in this study we present a method based on the exploitation of omics data and network analysis

  • In the case of phenotypic networks, while they present a similar number of nodes, there is a significant variation in the number of edges (12,499 for BERN versus 595,110 for MetaMap Lite)

  • The density values, which range between 0.008 and 0.033, reflect this disparity and coincide with those of other phenotypic disease networks obtained from medical text m­ ining[27]

Read more

Summary

Introduction

As an alternative to the use of annotated datasets in the development of bio-NER tools, in this study we present a method based on the exploitation of omics data and network analysis. It was observed that disease networks obtained from medical texts tended to form clear, highly interconnected communities, which coincided significantly with the disease categories of classifications systems such as the disease ontology (DO) and the medical subject headings (MeSH)[22,23] Given these precedents, our hypothesis is that the accuracy of a bio-NER tool can be measured by building a disease network from the extracted entities and calculating both its overlapping with omics networks and the coincidence of its communities with the categories of disease classification systems. Network Genomic Proteomic Pharmacologic MetaMap MetaMap (negation) MetaMap Lite MetaMap Lite (negation) CLAMP CLAMP (negation) BERN DISNET

Objectives
Methods
Results
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.