A study on real-time low-quality content detection on Twitter from the users' perspective.

Weiling Chen,Chiew Tong Lau,Chai Kiat Yeo,Bu Sung Lee,Hussein Suleman

doi:10.1371/journal.pone.0182487

Weiling Chen, Chiew Tong Lau + Show 3 more

Open Access

https://doi.org/10.1371/journal.pone.0182487

Copy DOI

Journal: PLOS ONE	Publication Date: Aug 9, 2017
Citations: 26	License type: CC BY 4.0

Affiliation: Nanyang Technological University

Abstract

Detection techniques of malicious content such as spam and phishing on Online Social Networks (OSN) are common with little attention paid to other types of low-quality content which actually impacts users’ content browsing experience most. The aim of our work is to detect low-quality content from the users’ perspective in real time. To define low-quality content comprehensibly, Expectation Maximization (EM) algorithm is first used to coarsely classify low-quality tweets into four categories. Based on this preliminary study, a survey is carefully designed to gather users’ opinions on different categories of low-quality content. Both direct and indirect features including newly proposed features are identified to characterize all types of low-quality content. We then further combine word level analysis with the identified features and build a keyword blacklist dictionary to improve the detection performance. We manually label an extensive Twitter dataset of 100,000 tweets and perform low-quality content detection in real time based on the characterized significant features and word level analysis. The results of our research show that our method has a high accuracy of 0.9711 and a good F1 of 0.8379 based on a random forest classifier with real time performance in the detection of low-quality content in tweets. Our work therefore achieves a positive impact in improving user experience in browsing social media content.

Highlights

Online Social Networks (OSN) in a web 2.0 era have developed from monotonous social interactions and communication into an integration of social media functions for all kinds of services [1]
We designed a survey according to the cluster analysis and put it online where participants had to answer two questions related to personal information, namely, age and gender and eight questions related to online social networks and low-quality content
The method we propose in this paper tackles the problem in a holistic manner since the low-quality content which we detect covers valueless content of different types from the users’ perspective and include spam and phishing which are commonly covered by existing works

Summary

Introduction

Online Social Networks (OSN) in a web 2.0 era have developed from monotonous social interactions and communication into an integration of social media functions for all kinds of services [1]. More and more social network sites have sprung up and attracted millions of users. With the fast growth of OSN, they have become the new target of many cyber criminals like spammers and phishers as well as many advertisers which have resulted in worrying issues. Spam is usually designed to make the potential victims spend money on fake or counterfeit products and services or are just outright frauds [3].

Objectives

Methods

Results

Conclusion

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

A study on real-time low-quality content detection on Twitter from the users' perspective.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: PLOS ONE

Lead the way for us

Similar Papers

DST: days spent together using soft sensory information on OSNs—a case study on Facebook
Fatimah Alzamzami ... Mukesh Saini
Soft Computing | VOL. 21
Fatimah Alzamzami, et. al.Fatimah Alzamzami ... Mukesh Saini
14 May 2016
Soft Computing | VOL. 21

DSpamOnto: An Ontology Modelling for Domain-Specific Social Spammers in Microblogging
Malak Al-Hassan ... Ahmad Al Hwaitat
Big Data and Cognitive Computing | VOL. 7
Malak Al-Hassan, et. al.Malak Al-Hassan ... Ahmad Al Hwaitat
02 Jun 2023
Big Data and Cognitive Computing | VOL. 7

Examining profile disclosure on online social networks: an affective, behavioural, and cognitive perspective
Tziporah Stern ... David Salb
International Journal of Electronic Business | VOL. 12
Tziporah Stern, et. al.Tziporah Stern ... David Salb
01 Jan 2015
International Journal of Electronic Business | VOL. 12

Digital fingerprinting for identifying malicious collusive groups on Twitter
Ruth Ikwu ... Pete Burnap
Journal of Cybersecurity | VOL. 9
Ruth Ikwu, et. al.Ruth Ikwu ... Pete Burnap
05 Jan 2023
Journal of Cybersecurity | VOL. 9

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

A study on real-time low-quality content detection on Twitter from the users' perspective.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: PLOS ONE