Accurate stock market forecasts can bring high returns for investors. There have been a growing number of studies employing machine learning technology to perform stock prediction tasks with the development of machine learning and artificial intelligence technologies. However, accurately predicting stock price trends still is an elusive goal, not only because the stock market is affected by policies, market environment, market sentiment, etc., but also because stock price data is inherently complex, noisy, and nonlinear. Many technical indicators have been used as input features to stock prediction models, but the quality of technical indicators has always been a neglected issue, thus the application of feature engineering in stock prediction tasks needs to be further expanded. Using 18 technical indicators as the original features, this paper presents improved technical indicators based on wavelet denoising and a novel two-stage adaptive feature selection method. Finally, the random forest model is used as the stock prediction model. Experiments show that in contrast to the original technical indicators, the improved technical indicators significantly enhance the performance of the model (e.g., F1 scores increased by 34.48% on the SSE Composite Index (SSEC) data set, 41.56% on the Hang Seng Index (HSI) data set, 34.48% on the Dow Jones Industrial Average (DJI) data set, 32.75% on the Standard & Poor's 500 Index (S&P 500) data set). The experimental results verify the importance of the quality of technical indicators in the task of stock prediction. Meanwhile, the results also demonstrate the effectiveness of the feature selection method, which can achieve higher prediction accuracy with fewer features. In addition, we established multiple data sets according to the size-varied time windows to study the influence of the size-varied time windows. The results show that properly increasing the size of the time window can exert a positive impact on the model. Finally, by utilizing our two-stage adaptive feature selection method, we remove redundant features, and achieve excellent results on data sets from four different stock markets (e.g., F1 scores reached 0.754 on the SSEC data set, 0.794 on the HSI data set, 0.789 on the DJI data set, 0.821 on the S&P 500 data set). Overall, this study experimentally verifies that improving feature quality can positively impact model performance, and that choosing an appropriate combination of input features can not only improve model performance, but reduces the negative impact of the curse of dimensionality as well.
Read full abstract