Abstract

BackgroundGene name recognition and normalization is, together with detection of other named entities, a crucial step in biomedical text mining and the underlying basis for development of more advanced techniques like extraction of complex events. While the current state of the art solutions achieve highly promising results on average, performance can drop significantly for specific genes with highly ambiguous synonyms. Depending on the topic of interest, this can cause the need for extensive manual curation of such text mining results. Our goal was to enhance this curation step based on tools widely used in pharmaceutical industry utilizing the text processing and classification capabilities of the Konstanz Information Miner (KNIME) along with publicly available sources. ResultsF-score achieved on gene specific test corpora for highly ambiguous genes could be improved from values close to zero, due to very low precision, to values >0.9 for several cases. Interestingly the presented approach even resulted in an increased F-score for genes showing already good results in initial gene name normalization. For most test cases, we could significantly improve precision, while retaining a high recall. ConclusionsWe could show that KNIME can be used to assist in manual curation of text mining results containing high numbers of false positive hits. Our results also indicate that it could be beneficial for future development in the field of gene name normalization to create gene specific training corpora based on incorrectly identified genes common to current state of the art algorithms.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call