Abstract

BackgroundThe identification of mentions of gene or gene products in biomedical texts is a critical step in the development of text mining applications in biosciences. The complexity and ambiguity of gene nomenclature makes this a very difficult task.MethodsHere we present a novel approach based on a combination of carefully designed rules and several lexicons of biological concepts, implemented in the Text Detective system. Text Detective is able to normalize the results of gene mentions found by offering the appropriate database reference.ResultsIn BioCreAtIvE evaluation, Text Detective achieved results of 84% precision, 71% recall for task 1A, and 79% precision, 71% recall for mouse genes in task 1B.

Highlights

  • Identifying the entities and concepts that are mentioned in a text is a mandatory step for systems attempting accurate information retrieval and, especially, information extraction tasks

  • Data for yeast and mouse come from BioCreAtIvE evaluation, data for human come from a hand-annotated set of 500 articles

  • Text Detective is a rule-based system for annotating and normalizing gene mentions in texts, that reaches high precision and recall for this task

Read more

Summary

Methods

We present a novel approach based on a combination of carefully designed rules and several lexicons of biological concepts, implemented in the Text Detective system. Text Detective is able to normalize the results of gene mentions found by offering the appropriate database reference

Introduction
Results
Discussion
Dickman S
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call