Prediction of stock price movement using an improved NSGA-II-RF algorithm with a three-stage feature engineering process.

Xiaohua Zeng,Changzhou Liang,Chiping Yuan,Jieping Cai,Diego Oliva

doi:10.1371/journal.pone.0287754

Xiaohua Zeng, Changzhou Liang + Show 3 more

Open Access

https://doi.org/10.1371/journal.pone.0287754

Copy DOI

Journal: PloS one	Publication Date: Jun 28, 2023
Citations: 1	License type: CC BY 4.0

Affiliation: Anhui Xinhua University

Abstract

Prediction of stock price has been a hot topic in artificial intelligence field. Computational intelligent methods such as machine learning or deep learning are explored in the prediction system in recent years. However, making accurate predictions of stock price direction is still a big challenge because stock prices are affected by nonlinear, nonstationary, and high dimensional features. In previous works, feature engineering was overlooked. How to select the optimal feature sets that affect stock price is a prominent solution. Hence, our motivation for this article is to propose an improved many-objective optimization algorithm integrating random forest (I-NSGA-II-RF) algorithm with a three-stage feature engineering process in order to decrease the computational complexity and improve the accuracy of prediction system. Maximizing accuracy and minimizing the optimal solution set are the optimization directions of the model in this study. The integrated information initialization population of two filtered feature selection methods is used to optimize the I-NSGA-II algorithm, using multiple chromosome hybrid coding to synchronously select features and optimize model parameters. Finally, the selected feature subset and parameters are input to the RF for training, prediction, and iterative optimization. Experimental results show that the I-NSGA-II-RF algorithm has the highest average accuracy, the smallest optimal solution set, and the shortest running time compared to the unmodified multi-objective feature selection algorithm and the single target feature selection algorithm. Compared to the deep learning model, this model has interpretability, higher accuracy, and less running time.

Full Text