Abstract

Textual data such as tweets and news is abundant on the web. However, extracting useful information from such a deluge of data is hardly possible for a human. In this paper, we discuss automated text analysis methods based on sparse optimization. In particular, we use sparse PCA and Elastic Net regression for extracting intelligible topics from a big textual corpus and for obtaining time-based signals quantifying the strength of each topic in time. These signals can then be used as regressors for modeling or predicting other related numerical indices. We applied this setup to the analysis of the topics that arose during the 2016 US presidential elections, and we used the topic strength signals in order to model their influence on the election polls.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call