Abstract

Prediction of stock trend has been an intriguing topic and is extensively studied by researchers from diversified fields. Machine learning, a well-established algorithm, has been also studied for its potentials in prediction of financial markets. In this paper, seven different techniques of data mining are applied to predict stock price movement of Shanghai Composite Index. The approaches include Support vector machine, Logistic regression, Naive Bayesian, K-nearest neighbor classification, Decision tree, Random forest and Adaboost. Extracting the corresponding comments between April 2017 and May 2018, it shows that: 1) sentiment derived from Eastmoney, a social media platform for the financial community in China, further enhances model performances, 2) for positive and negative sentiments classifications, all classifiers reach at least 75% accuracy and the linear SVC models prove to perform best, 3) according to the strong correlation between the price fluctuation and the bullish index, the approximate overall trend of the closing price can be acquired.

Highlights

  • With the rapid development of economy-oriented society, investor sentiment has received more and more attention

  • Extracting the corresponding comments between April 2017 and May 2018, it shows that: 1) sentiment derived from Eastmoney, a social media platform for the financial community in China, further enhances model performances, 2) for positive and negative sentiments classifications, all classifiers reach at least 75% accuracy and the linear support vector classification (SVC) models prove to perform best, 3) according to the strong correlation between the price fluctuation and the bullish index, the approximate overall trend of the closing price can be acquired

  • Data Preparation and Feature Engineering In the first phase, after data pre-processing, including word segmentation, pause word removal and tokenization, we leverage the unigram TF-IDF metric, a feature for word importance in a document that takes the product of term frequency (TF) and inverse document frequency (IDF)

Read more

Summary

Introduction

With the rapid development of economy-oriented society, investor sentiment has received more and more attention. The prediction of share returns based on mood states can be seen as market anomaly contradicting the efficient market hypothesis [20]. These mood-related anomalies can be explained by the misattribution bias according to which people make risky decisions depending on mood states [21]. Considering individual emotion is a vague concept, previous research made significant progress on various sentiment techniques after tracking indicators of public mood directly from social media content, such as Facebook and Twitter feeds [26] [27] [28] [29] [30]. We aim to analyze individual sentiment by addressing the accuracy of using seven machine learning algorithms in classifying financial stock comments into positive as well as negative classes. We assess the effects of including public mood information on the accuracy of a “baseline” prediction model rather than proposing an optimal prediction model

Methods
Actuality
Results and Discussions
Conclusions
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call