Abstract

Over the last years, methods of hybrid and ensemble have attracted the attention of the data mining community. Moreover, in the computational intelligence area such as machine learning, constructing and adaptive hybrid models have become essential to achieve good performance. However, the accuracy of stock market classification models is still low, and this has negatively affected the stock market indicators. Furthermore, there are many factors that have a direct effect on the classification models’ accuracies which were not addressed by previous research such as the automatic labelling technique which results in low classification accuracy due to the absence of specific lexicon, and the suitability of the classifiers to the data features and domain. In this research, a proposed model is designed to enhance the classification accuracy by the incorporation of stock market domain expert labelling technique and the construction of an ensemble Naïve Bayes classifiers to classify the stock market sentiments. The methodology for this research consists of five phases. The first phase is data collection, and the second phase is labelling, in which polarity of data is specified and negative, positive or neutral values are assigned. The third phase involves data pre-processing. The fourth phase is the classification phase in which suitable patterns of the stock market are identified by Ensemble Naïve Bayes classifiers, and the final is the performance and evaluation. The classification method has produced a significant result; it has achieved accuracy of more than 89%.

Highlights

  • Accurate classification of the data sources in the stock market domain is necessary for investors to make suitable decisions, such as selling or buying stocks (Hsu, Lessmann, Sung, Ma, & Johnson, 2016; Zhong & Enke, 2017; Alkubaisi, Kamaruddin, & Husni, 2017)

  • The current stock market classification models that utilize sentiment analysis on consumers reaction suffer from low accuracy in classification after being implemented on a dataset with different sources (Zhang, 2013; Navale et al, 2016; Arvanitis & Bassiliades, 2017)

  • A comparison was made between the proposed method and the baseline Naïve Bayes Classifiers (NBCs) to see how ensemble and expert labelling improve the stock market classification accuracy

Read more

Summary

Introduction

Accurate classification of the data sources in the stock market domain is necessary for investors to make suitable decisions, such as selling or buying stocks (Hsu, Lessmann, Sung, Ma, & Johnson, 2016; Zhong & Enke, 2017; Alkubaisi, Kamaruddin, & Husni, 2017). The current stock market classification models that utilize sentiment analysis on consumers reaction suffer from low accuracy in classification after being implemented on a dataset with different sources (Zhang, 2013; Navale et al, 2016; Arvanitis & Bassiliades, 2017). Many factors have a direct effect on the accuracy of classification models, such as sample size, labelling technique and the classification method (Jiang et al, 2007; Sathyadevan et al, 2014). The automatic technique used in the labelling phase affects the accuracy of the classification model in the absence of a specific lexicon. Automatic labelling recognizes sentiments expressed in given consumers reaction based on existing general lexicons not concerning the research domain (He & Zhou, 2011; Makrehchi, Shah, & Liao, 2013). The weakness here is related to the automatic assigning of polarity (positive, negative or neutral) that affects classification accuracy because it does not carry the real weight of the sentiment for each reaction by consumers

Results
Discussion
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call