Poster: Distinguishing scientific abbreviations and genes in bio-medical literature mining

Guozhen Liu,Han Zhang,George J Quellhorst

doi:10.1109/iccabs.2011.5729906

Abstract

The accumulation of biomedical literature makes it increasingly difficult for scientists to keep up with scientific advancements, requiring the development of text mining tools to collect and integrate data in a high-throughput fashion. A major challenge in biomedical text mining is how to recognize genes sensitively and accurately, and translate them to their official gene symbols. Gene symbols and their commonly used aliases and synonyms usually derive from an abbreviation of the gene's description. However, many gene symbols and alias exactly match other abbreviations commonly used in scientific literature that do not refer to genes. A systematic study on the abbreviations used in biomedical literatures should help improve the accuracy of gene recognition during text mining.

Full Text