Abstract

Content originality detection is an interesting research topic in large-scale scenarios especially in social media where anyone has the ability to produce and disseminate content in different forms through their profiles and activities. What is missing in these communication sites is to be able to identify original content producers as some users spread information copied from other users without indicating its original producer, or where they found it. This paper provides a conceptualized approach for content originality detection and illustrates the efficiency of the model when applying it to a Twitter dataset. This approach amalgamates user's linguistic features and their online circadian behaviors to identify accurately the content originator for a given text. The proposed approach is evaluated using an F1-measure and the results indicate an accuracy of 95% or higher for all test scenarios. While achieving high accuracy in the test results, our approach, as a usecase, was applied in the context of news agencies popular worldwide to identify news producers and consumers by analyzing their Tweets. We investigated intra and inter news flows among several major news agencies considered in our dataset. Our results show that this proposed approach can distinguish News Story Tellers from News Propagators in the news agencies community as well as provide information that helps to understand the flow patterns between different news groups.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call