Abstract

Automatic extraction of abbreviation and its definition from free format text is a constructive task in text mining. The previous work pertinent to automatic abbreviation/definition extraction from text followed either heuristics or machine learning approach. This paper proposes a hybrid model to identify abbreviation definition pairs from biomedical text. The proposed system uses two approaches i) To identify abbreviation-definition pairs, pattern matching is done through sequence labeling based on the heuristics approach. Three mapping strategies such as Predecessor Term Mapping, Word Level Mapping, and Character Level Mapping are used in sequence labeling tasks. ii) To validate the identified abbreviation-definition pair, an ANN-based approach such as a single layer neural network (perceptron) is used in this work. PubMed biomedical abstracts are utilized as a data source to find the abbreviation-definition pairs. The system performance is analyzed across six different entities in biomedical abstracts. The experiment result shows that our model achieves precision of 96.2%, recall of 92.4%, and F1 of 94.6%. To cross-validate the system performance, the proposed model is validated by using two corpuses AB3P and BioADI, the outcomes of which are discussed in the results section.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call