Abstract

The construction of domain-specific sentiment lexicon has become an important direction to improve the performance of sentiment analysis in recent years. As one of the important application areas of sentiment analysis, the stock market also has some related researches. However, when considering the heterogeneity of the stock market relative to other fields, these studies ignore the heterogeneity of the stock market under different market conditions. At the same time, the annotated corpus is also indispensable for these studies, but the annotated corpus, especially the social media corpus that is not standardized, domain-specific and large in volume, is very difficult to obtain, manually labeling or automatic labeling has certain limitations. Besides, in the evaluation of the stock market sentiment lexicon, it is still based on the general classification algorithm evaluation criteria, but ignores the final application purpose of the sentiment analysis in the stock market: helping the stock market participants make investment decisions, that is, to achieve the highest profit. To address those problems, this paper proposes an unsupervised new method of constructing the stock market sentiment lexicon which based on the heterogeneity of the stock market, and an evaluation method of stock market sentiment lexicon. Subsequently, we selected four commonly used Chinese sentiment dictionaries as benchmark lexicons, and verified the method with an unlabeled Eastmoney stock posting corpus containing 15,733,552 posts about 2400 Chinese A-share listed companies. Finally, under our lexicon evaluation framework which based on the portfolio annualized return, the stock market sentiment lexicon constructed in this paper has achieved the best performance.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.