The paper examines the factors that influence Bitcoin price direction from the perspective of machine learning (ML) models. The observed factors cover Bitcoin market data, technical indicators, blockchain variables, sentiment analysis, and other macro-financial variables. Logistic Regression (LR), Random Forest (RF), and Support Vector Machine (SVM) classifiers are employed. Three train-test ratios are considered. Grid search and blocking time series cross-validation are used to adjust the hyperparameters of the proposed ML algorithms resulting in the three most accurate models for each train-test ratio. Variables that affect the next-day price direction are ranked using LR and RF best models. For each method and train-test ratio, the smallest subsets of independent variables with the highest test set accuracy were chosen to reduce dimensionality. Models show that technical indicators influence daily Bitcoin price direction the most, followed by blockchain and Bitcoin market variables. Contrarily, models disagree on the importance of Tweets and macro-financial variables. Finally, SVM performed better on the test set when the LR optimal sets of independent variables were considered, indicating that the analysis of individual factors' influence on the Bitcoin price is not important only for corresponding model. Combining only influential independent variables and 90:10 train-test ratio yielded the greatest accuracy of 58.18 % achieved by RF model.
Read full abstract