Measuring Political Sentiment on Twitter: Factor Optimal Design for Multinomial Inverse Regression

Matt Taddy

doi:10.1080/00401706.2013.778791

Abstract

This article presents a short case study in text analysis: the scoring of Twitter posts for positive, negative, or neutral sentiment directed toward particular U.S. politicians. The study requires selection of a subsample of representative posts for sentiment scoring, a common and costly aspect of sentiment mining. As a general contribution, our application is preceded by a proposed algorithm for maximizing sampling efficiency. In particular, we outline and illustrate greedy selection of documents to build designs that are D-optimal in a topic-factor decomposition of the original text. The strategy is applied to our motivating dataset of political posts, and we outline a new technique for predicting both generic and subject-specific document sentiment through the use of variable interactions in multinomial inverse regression. Results are presented for analysis of 2.1 million Twitter posts collected around February 2012. Computer codes and data are provided as supplementary material online.

Full Text