Abstract

Online forum provides a popular platform for users to communicate and share experiences. Comparing to the social networks which require users to register with their real identities, online forum has no such requirement which makes its users more willing to speak the truth. Unfortunately, the anonymity makes online forum more likely to be abused by malicious users. Internet water army is a typical bad phenomenon which is not only harmful to the benefits of ordinary Internet users, normal companies, but also detrimental to the social stability and the national security. Timely find out and clean up Internet water army in online forum has important significance for boosting the user experience, improving the credibility of network information and maintaining the network space security. In this paper, we propose a novel divide-and-conquer online forum Internet water army detection algorithm according to the fact that Internet water army always appear in groups, echo each other and work in collusion. The major innovations of this paper could be summarized as the following 4 points. Firstly, we propose a new measure of online forum user behavior similarity which compares the behaviors of user pairs from 3 aspects. Secondly, we put forward an interesting social network model in which edges between two users are built if they have similar behaviors. Then we prune the network by deleting the edges whose similarity is below a certain threshold and adopt a hierarchical clustering algorithm on the pruned network to find user groups who work highly cooperatively. Thirdly, we divide the whole dataset into a great many small subsets according to the discussion thread IDs and process all the subsets in parallel, reducing the time complexity greatly. We evaluate our method using the real dataset of Sina Forum and the experimental results show that our algorithm can detect Internet water army in online forums effectively and the accuracy of our algorithm is high. We conduct empirical analysis to the Internet water army we detected and find that their behavior patterns are very different from normal users'. The findings verify the correctness of our algorithm and lay the foundation for characteristic-based Internet water army detection.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call