A New Method for Abbreviation Prediction via CNN-BLSTM-CRF

Jianyu Zheng,Xinge Xiao,Lijiao Yang,Bihua Wang,Yun Zhu

doi:10.1088/1742-6596/1267/1/012001

Abstract

It is a crucial problem to process abbreviation in the field of natural language processing. The most commonly used way to cope with this problem is to construct the reference database by predicting the abbreviation through its fully expanded form. Previous work on abbreviation prediction mostly rely on traditional machine learning algorithms, which inevitably requires a large number of manual annotations or expert knowledge to establish a feature system. In this paper, a neural network model based on CNN-BLSTM-CRF is proposed, which can predict Chinese abbreviations better without relying too much on the feature system: Firstly, convolutional neural network extracts phrase and Chinese character information from the fully expanded form, and then BLSTM-CRF deep network is constructed to annotate the fully expanded form, so as to extract its corresponding abbreviation form. The experimental results show that the method in this paper can perform better than the state-of-art method in traditional machine learning, and the results provide a reference for abbreviation research and the construction of resource repository.

Full Text