Topic extraction from millions of tweets using singular value decomposition and feature selection

Takako Hashimoto,Tetsuji Kuboyama,Basabi Chakraborty

doi:10.1109/apsipa.2015.7415451

Abstract

Social media offers a wealth of insight into how significant topics — such as the Great East Japan Earthquake, the Arab Spring, and the Boston Bombing — affect individuals. The scale of available data, however, can be intimidating: during the Great East Japan Earthquake, over 8 million tweets were sent each day from Japan alone. Conventional word vector-based social media analysis method using Latent Semantic Analysis, Latent Dirichlet Allocation, or graph community detection often cannot scale to such a large volume of data due to their space and time complexity. To overcome the scalability problem, in this paper, high performance Singular Vector Decomposition (SVD) library redsvd has been used to identify topics over time from the huge data set of over two hundred million tweets sent in the 21 days following the Great East Japan Earthquake. While we begin with word count vectors of authors and words for each time slot (in our case, every hour), authors' clusters from each slot are extracted by SVD and k-means. And then, the original fast feature selection algorithm named CWC has been used to extract discriminative words from each cluster. As a result, authors' clusters recognized as topics as well as issues of conventional social media analysis method for big data can be visualized overcoming the scalability problem.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Topic extraction from millions of tweets using singular value decomposition and feature selection

Abstract

Talk to us

Similar Papers

Lead the way for us

Similar Papers

Event Detection from Millions of Tweets Related to the Great East Japan Earthquake Using Feature Selection Technique
Takako Hashimoto ... Kilho Shin
-
Takako Hashimoto, et. al.Takako Hashimoto ... Kilho Shin
01 Nov 2015
01 Nov 2015

Topic Extraction Method from Millions of Tweets Based on Fast Feature Selection Technique CWC
Takako Hashimoto ... Kilho Shin
-
Takako Hashimoto, et. al.Takako Hashimoto ... Kilho Shin
01 Dec 2016
01 Dec 2016

Invited talk-1: Event detection from millions of tweets related to disasters using high-performance feature selection technique
Takako Hashimoto
-
Takako HashimotoTakako Hashimoto
01 Dec 2015
01 Dec 2015

Improve topic modeling algorithms based on Twitter hashtags
Hayder M Alash ... Ghaidaa A Al-Sultany
Journal of Physics: Conference Series | VOL. 1660
Hayder M Alash, et. al.Hayder M Alash ... Ghaidaa A Al-Sultany
01 Nov 2020
Journal of Physics: Conference Series | VOL. 1660

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Topic extraction from millions of tweets using singular value decomposition and feature selection

Abstract

Talk to us

Similar Papers