Bitcoin, with its ever-growing popularity, has demonstrated extreme price volatility since its origin. Extreme price fluctuations have been known to occur due to tweets from Elon Musk, Michael Saylor, and others. In this paper, we aim to investigate whether we can leverage Twitter data to predict these extreme price movements. Existing social media models often take a shortcut and include sentiment extracted from tweets. In this work, however, we want to embed the actual tweets in a domain-informed way, and investigate whether they have an impact. Hence, we propose a multimodal deep learning model for predicting extreme price fluctuations that takes as input candlestick data, prices of a variety of correlated assets, technical indicators, as well as Twitter content. To train the model, a new dataset of 5,000 tweets per day containing the keyword ‘Bitcoin’ was collected from 2015 to 2021. This dataset, called PreBit, is made available online11https://www.kaggle.com/datasets/zyz5557585/prebit-multimodal-dataset-for-bitcoin-price., as is our model.22https://github.com/AMAAI-Lab/PreBit. Our proposed hybrid multimodal model consists of an SVM model based on price data, which is fused with a text-based Convolutional Neural Network. In the text-based model, we use the sentence-level FinBERT embeddings, pretrained on financial lexicons, so as to capture the full contents of the tweets and feed it to the model in an understandable way. In an ablation study, we explore whether adding social media data from the general public on Bitcoin improves the model’s ability to predict extreme price movements. Finally, we propose and backtest a trading strategy based on the predictions of our models with varying prediction threshold and show that it can be used to build a profitable trading strategy with a reduced risk over a ‘hold’ or moving average strategy.
Read full abstract