Extracting Socio-Economic Indicators from Chinese Text with a BERT-based Model

Zhe Zhang,Jing Fan,Zhiyu Liu,Xin Zhang,Jie Yang

doi:10.1109/bigdia51454.2020.00060

Abstract

Socio-economic indicators are powerful instruments for measuring economic conditions. Extracting them can help people grasp the economy trend and make decisions. Traditional machine learning methods for indicator extraction rely heavily on handcrafted features, which costs a large amount of human effort. While, deep learning methods can solve this problem but require a huge amount of labeled data, which is the trickiest challenge as the labeled data in indicator extraction task is quite rare. In this paper, we use a BERT-based model to deal with the challenges in this task. The model firstly represents input text with BERT, taking advantage of the strong ability of BERT to capture generic language features. Then, it fine-tunes the pre-trained model through the labeled data in our indicator extraction task to learn the specific features. Finally, they go through a conditional random field (CRF) layer to get the predicted tags across output token labels. In this way, our model does not require too much labeled data but it can automatically and sufficiently capture the language features of input text. Additionally, this paper also constructs a middle scale dataset for fine-tuning process and evaluates our model on it. The results demonstrate that the BERT-based model is superior to some strong baselines.

Full Text