Abstract

Analyzing user behavior in online spaces is an important task. This paper is dedicated to analyzing the online community in terms of topics. We present a user–topic model based on the latent Dirichlet allocation (LDA), as an application of topic modeling in a domain other than textual data. This model substitutes the concept of word occurrence in the original LDA method with user participation. The proposed method deals with many problems regarding topic modeling and user analysis, which include: inclusion of dynamic topics, visualization of user interaction networks, and event detection. We collected datasets from four online communities with different characteristics, and conducted experiments to demonstrate the effectiveness of our method by revealing interesting findings covering numerous aspects.

Highlights

  • We present a preprocessing method that should be considered when taking into account the exploitation of online community datasets, which enables the proposed method to capture both the temporal and thematic features of user behavior effectively

  • We performed topic modeling with the number of topics ranging from 2 to 64

  • As user–topic modeling is based on the assumption that users with similar latent topics are likely to be engaged in the same article, the proposed method can be used for clustering purposes

Read more

Summary

Motivation

The online community is an important virtual space where information spreads and users express their opinions and emotions. Instead of developing a complex probabilistic model for analyzing user behavior, we adopted the simple form of the topic modeling method to ensure flexibility, but we make an important and effective substitution. The concept itself has been introduced in our previous work [18], we provide extensive experimental results in this work especially focusing on demonstrating the capability of analyzing user behavior in web communities from many perspectives, which the previous work did not cover. The proposed method is simple because it uses standard LDA with the small substitution of user participation for word occurrence It is flexible, because it does not use too many features, yet ensures sufficient functionality in many applications of user behavior analysis.

Related Work
Topical User Modeling
Analogy
Preliminaries
Preprocessing
Topic Assignment with Gibbs Sampling
Dataset Description
Experimental Results
Thematic and Temporal Analysis
Temporal Topic Flow
Qualitative Topic Assessment from Top Articles
Comparison with Textual Topic Modeling
User Clustering
Visual Analytics with User Replying Network
Clustering Coefficient
Temporal Behavior of Community Users
Herd Behavior and Event Detection
Comparison with Topic Models on User Network
Conclusions
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call