Abstract

As the number of financial literature grows rapidly, Financial text mining is becoming important increasingly. In recent years, extracting valuable information from financial documents, namely financial text mining, gained significant popularity within research communities. Although Deep Learning-based financial text mining has achieved remarkable progress recently, in financial fields it still suffers from issues of lack of task-specific labeled training data. To alleviate these issues, we present a pretraining financial text encoder, named F-BERT, a domain-specific language model pretrained on large-scale financial corpora. Different from original BERT, proposed F-BERT is trained continually on both general corpus and financial domain corpus, and four pretraining tasks can be pretrained through lifelong learning, which can enable our F-BERT to continually capture language knowledge and semantic information. The experimental results demonstrate that proposed F-BERT achieves strong results on several financial text mining tasks. Extensive experimental results show the effectiveness and robustness of F-BERT. The source code and pretrained models of F-BERT are available online.

Highlights

  • In finance and economics, various financial text data are used to analyze, predict future financial market trends

  • We present F-BERT model addressing the issue by leveraging unsupervised Transfer Learning and Lifelong Learning

  • To prevent the model from cheating, 80% of these selected tokens are replaced by a special [MASK] symbol in the input, 10% are replaced by a random token from the vocabulary, and 10% are left unchanged; 2) SENTENCE PREDICTION (NSP) TASK Next Sentence Prediction (NSP) pre-training is a binary classification task, whose aims to predict whether two sentences are consecutive

Read more

Summary

INTRODUCTION

Various financial text data are used to analyze, predict future financial market trends. In financial text mining tasks, constructing supervised training data is prohibitively expensive as this requires the use of expert knowledge in finance fields. Proposed F-BERT differs from standard PLMs pretraining methods It constructs seven pretraining tasks, simultaneously trained on general corpora and financial domain corpora, to help F-BERT better capture language knowledge and semantic information. We construct four self-supervised pretraining tasks (subsection III-A), which can be learned through lifelong learning, capable of continually capturing language knowledge and semantic information in large-scale pretraining corpora. We have proposed a pretraining financial text encoder with minimal task-specific architecture modifications, which is capable of being effectively applied to various financial text mining tasks

BACKGROUND
PROPOSED MODEL
ABLATION STUDY AND ANALYSES
ON-LINE EVALUATION
Findings
VIII. CONCLUSION
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call