Abstract

Stock comments published by experts are important references for accurate stock trends prediction. How to comprehensively and accurately capture the topic of expert stock comments is an important issue which belongs to text classification. The Bidirectional Encoder Representations from Transformers (BERT) pretrained language model is widely used for text classification, due to its high identification accuracy. However, BERT has some limitations. First, it only utilizes fixed length text, leading to suboptimal performance in long text information exploration. Second, it only relies on the features extracted from the last layer, resulting in incomprehensive classification features. To tackle these issues, we propose a multi-layer features ablation study of BERT model for accurate identification of stock comments’ themes. Specifically, we firstly divide the original text to meet the length requirement of the BERT model based on sliding window technology. In this way, we can enlarge the sample size which is beneficial for reducing the over-fitting problem. At the same time, by dividing the long text into multiple short texts, all the information of the long text can be comprehensively captured through the synthesis of the subject information of multiple short texts. In addition, we extract the output features of each layer in the BERT model and apply the ablation strategy to extract more effective information in these features. Experimental results demonstrate that compared with non-intercepted comments, the topic recognition accuracy is improved by intercepting stock comments based on sliding window technology. It proves that intercepting text can improve the performance of text classification. Compared with the BERT, the multi-layer features ablation study we present in the paper further improves the performance in the topic recognition of stock comments, and can provide reference for the majority of investors. Our study has better performance and practicability on stock trend prediction by stock comments topic recognition.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.