US recession prediction using statistical and natural language processing methods

He Jiang,Qitong Liang,Jiarui Zheng,Ruochen Wang,Yaohao Fan,Jiawei Tian

doi:10.54254/2753-8818/19/20230535

Abstract

This study mainly predicts the recession in the United States. We build our model based on the data of more than ten recessions experienced by the United States since the mid-20th century. Our research can be divided into two parts, one part is a machine learning model constructed using econometrics theory, and the other part is a text analysis model based on natural language processing (NLP) techniques. We collected quarterly data from January 1, 1950, to September 1, 2020, to examine each historical recessionary period. We select key macroeconomic variables such as real GDP growth rate, unemployment rate, and interest rates as variables to build the machine learning model. Depending on the data type and model accuracy, we adopted three models, Support Vector Classification (SVC), Naive Bayes, and Logistic Regression, where the SVC model has the highest accuracy, above 80%. Regarding NLP models, we choose the reports based on Bank of International Settlements central bank speeches (BIS) to complete the relevant analysis. We evaluate bag-of-words and convolutional neural networks in conjunction with Epoch loss to determine how well the model's predictions match the actual data. Although we have debugged the NLP model many times, its accuracy still needs to be higher than that of the econometric model. How to effectively improve the prediction accuracy of the NLP model will be the main problem we hope to solve in the future.

Full Text