Abstract

A time-series of numerical data and a sequence of time-ordered documents are often correlated. This paper aims at modeling the impact that the underlying themes discussed in the text data have on the time series. To do so, we introduce an original topic model, Time Series Impact Driven by Topic Modeling (TSITM), that includes contextual data by coupling Latent Dirichlet Allocation (LDA) with linear regression, using an elastic net prior to set to zero the impact of uncorrelated topics. The resulting topics act as explanatory variables for the regression of the numerical time series, which allows us to understand the time series movements based on the events described on the text data. We have tested our model on two datasets: first, we used political news to explain the US president&#x2019;s disapproval ratings; then, we considered a corpus of economic news to explain the financial returns of 4 different multinational corporations. Our experiments show that an appropriate selection of hyperparameters leads to significant correlations: both an intrinsic baseline and state of the art methods were significantly outperformed by TSITM in MSE, MAE and out-of-sample <i>R</i><sup>2</sup>, according to our hypothesis tests. We believe that this framework can be useful in the context of reputational risk management.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call