Classifying latent infection states in complex networks

Yeon-Sup Lim,Don Towsley,Bruno Ribeiro

doi:10.1186/s40649-015-0015-6

Abstract

Algorithms for identifying the infection states of nodes in a network are crucial for understanding and containing infections. Often, however, only a relatively small set of nodes have a known infection state. Moreover, the length of time that each node has been infected is also unknown. This missing data -- infection state of most nodes and infection time of the unobserved infected nodes -- poses a challenge to the study of real-world cascades. In this work, we develop techniques to identify the latent infected nodes in the presence of missing infection time-and-state data. Based on the likely epidemic paths predicted by the simple susceptible-infected epidemic model, we propose a measure (Infection Betweenness) for uncovering these unknown infection states. Our experimental results using machine learning algorithms show that Infection Betweenness is the most effective feature for identifying latent infected nodes.

Highlights

Networks are underlying mediums for the spread of epidemics such as diseases, rumors, and computer viruses
We evaluate classifiers based on Naive Bayes, Naive Bayes with kernel density estimation [1], and decision trees [2], in combination with features including infection betweenness centrality and different centrality measures on inferring unknown states
We evaluate the performance of NB to classify the states of unobserved nodes as well as Naive Bayes using kernel density estimation (NBK); kernel density estimation uses multiple (Gaussian) distributions, and is generally more effective than using a single (Gaussian) distribution [1]

Summary

Introduction

Networks are underlying mediums for the spread of epidemics such as diseases, rumors, and computer viruses. Determining the infection states of network nodes is the first step to taking corrective or preventive action to stop or slow the spread of an epidemic. The infection states of network nodes are often unknown; for example: in the spread of computer malware (say, a contaminated email attachment) in a large organization, a network IT specialist will likely only inspect the computers of users that open trouble tickets; a similar problem occurs with the spread of rumors over online social networks. The problem of effectively identifying the infection states of unobserved nodes given a set of observed nodes is of central importance in the study of infection cascades. Our research question is: Given a set of nodes with known infection states and the network topology, can we correctly uncover the unknown infection states of the remaining nodes?

Objectives

Results

Conclusion