Abstract

Social networks have become ubiquitous in modern society, which makes social network monitoring a research area of significant practical importance. Social network data consist of social interactions between pairs of individuals that are temporally aggregated over a certain interval of time, and the level of such temporal aggregation can have substantial impact on social network monitoring. There have been several studies on the effect of temporal aggregation in the process monitoring literature, but no studies on the effect of temporal aggregation in social network monitoring. We use the degree corrected stochastic block model (DCSBM) to simulate social networks and network anomalies and analyze these networks in the context of both count and binary network data. In conjunction with this model, we use the Priebe scan method as the monitoring method. We demonstrate that temporal aggregation at high levels leads to a considerable decrease in the ability to detect an anomaly within a specified time period. Moreover, converting social network communication data from counts to binary indicators can result in a significant loss of information, hindering detection performance. Aggregation at an appropriate level with count data, however, can amplify the anomalous signal generated by network anomalies and improve detection performance. Our results provide both insights on the practical effects of temporal aggregation and a framework for the study of other combinations of network models, surveillance methods, and types of anomalies.

Highlights

  • The availability of network data has increased dramatically in the last decade or so due to developments in communication technology

  • The rest of the paper is organized as follows: we provide a brief review of some network terminology, followed by a description of the degree corrected stochastic block model (DCSBM) model used to simulate the social network data, as well as the scan method used to monitor the networks

  • We have demonstrated how aggregating social network data at different levels affects the performance of Priebe’s scan method

Read more

Summary

Introduction

The availability of network data has increased dramatically in the last decade or so due to developments in communication technology. The origins of these data vary greatly depending on sources such as cell phone networks, social media, and other internet-based communications. Researchers face difficult challenges due to the velocity and the volume at which these data are generated. Statistical analysis of networks has recently received increased emphasis in the statistics literature, leading to the development of a rich toolbox of network.

Objectives
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call