Abstract

This paper presents a novel system, synthetic high-fidelity social media data generator (SHIELD), for generating the synthetic social media data. SHIELD jointly generates time-varying, directed and weighted interaction graph structures and topic-driven text features similar to the input social media data. A synthetic interaction graph is generated by a social network model to minimize the distance to real graph and is enhanced by adding various patterns, such as anomalies and information cascades, interaction types, and temporal dynamics. A synthetic text generator based on the $n$ -gram Markov model is trained under each topic identified by topic modeling. Synthetic text and graph structures are combined through the assignment of synthetic social media entities. Extensive performance evaluation via a graph and text analysis is provided to demonstrate the statistical fidelity of large-scale synthetic data generated by SHIELD. A data evaluation exercise with human participants is executed to identify how difficult it is for a human to distinguish between tweets that were generated by SHIELD and tweets that were posted by real users. Experimental results followed by a statistical significance analysis showed that human participants cannot reliably distinguish between real and synthetic tweets.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call