Abstract

IMA Journal of Applied Mathematics (2016) 81, 409–431 doi:10.1093/imamat/hxw025 Advance Access publication on 13 July 2016 Topic time series analysis of microblogs and P. Jeffrey Brantingham Department of Anthropology, UCLA, Los Angeles, CA 90095, USA [Received on 24 April 2016] Social media data tend to cluster around events and themes. Local newsworthy events, sports team victories or defeats, abnormal weather patterns and globally trending topics all influence the content of online discussion. The automated discovery of these underlying themes from corpora of text is of interest to numerous academic fields as well as to law enforcement organizations and commercial users. One useful class of tools to deal with such problems are topic models, which attempt to recover latent groups of word associations from the text. However, it is clear that these topics may also exhibit patterns in both time and space. The recovery of such patterns complements the analysis of the text itself and in many cases provides additional context. In this work we describe two methods for mining interesting spatio-temporal dynamics and relations among topics, one that compares the topic distributions as histograms in space and time and another that models topics over time as temporal or spatio-temporal Hawkes process with exponential trigger functions. Both methods may be used to discover topics with abnormal distributions in space and time. The second method also allows for self-exciting topics and can recover intertopic relationships (excitation or inhibition) in both time and space. We apply these methods to a geo-tagged Twitter dataset and provide analysis and discussion of the results. Keywords: mining complex datasets; spatial and temporal analysis; topic modeling; cluster analysis. ©The authors 2016. Published by Oxford University Press on behalf of the Institute of Mathematics and its Applications. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited. Downloaded from http://imamat.oxfordjournals.org/ at :: on November 7, 2016 Eric L. Lai Department of Mathematics, UCI, Irvine, CA 92697, USA Daniel Moyer Department of Computer Science, University of Southern California, Los Angeles, CA 90033, USA and Department of Mathematics, UCLA, Los Angeles, CA 90095, USA Baichuan Yuan Department of Mathematics, Zhejiang University, Hangzhou 310027, China and Department of Mathematics, UCLA, Los Angeles, CA 90095, USA Eric Fox Department of Statistics, UCLA, Los Angeles, CA 90095, USA Blake Hunter Mathematical Sciences, Claremont Mckenna College, Claremont, CA 91711, USA and Department of Mathematics, UCLA, Los Angeles, CA 90095, USA Andrea L. Bertozzi ∗ Department of Mathematics, UCLA, Los Angeles, CA 90095, USA Corresponding author: bertozzi@math.ucla.edu

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.