Abstract

Backgroundreal-world networks such as social and communication networks are too large to be observed entirely. Such networks are often partially observed such that network size, network topology, and nodes of the original network are unknown. Analysis on partially observed data may lead to incorrect conclusions.MethodsWe assume that we are given an incomplete snapshot of a large network and additional nodes can be discovered by querying nodes in the currently observed network. The goal of this problem is to maximize the number of observed nodes within a given query budget. Querying which set of nodes maximizes the size of the observed network? We formulate this problem as an exploration-exploitation problem and propose a novel nonparametric multi-armed bandit (MAB) algorithm for identifying which nodes to be queried.ResultsOur proposed nonparametric multi-armed bandit algorithm outperforms existing state-of-the-art algorithms by discovering over 40% more nodes in synthetic and real-world networks. Moreover, we provide theoretical guarantee that the proposed algorithm has sublinear regret.ConclusionsOur results demonstrate that multi-armed bandit based algorithms are well suited for exploring partially observed networks compared to heuristic based algorithms.

Highlights

  • Interactions among different entities in many real-world complex systems are often represented by networks, where the entities are represented by nodes and the interactions among them are represented as links between entities

  • 3 We provide a proof that the regret of the proposed bandit algorithm is sublinear.* 4 Using i KNN-upper confidence bound (UCB) algorithm on synthetic networks and real-world networks from different domains, we demonstrate that our proposed method performs significantly better than existing methods4

  • Avrachenkov et al (2014) propose Maximum Expected Uncovered Degree (MEUD), a greedy algorithm for selecting which node to be probed. This algorithm requires the degree distribution of the original network to be known. When this requirement is not fulfilled, it reduces to Maximum Observed Degree (MOD) algorithm which greedily chooses the node with the largest observed degree

Read more

Summary

Introduction

Interactions among different entities in many real-world complex systems are often represented by networks, where the entities are represented by nodes and the interactions among them are represented as links between entities. Data acquisition is done using Application Programming Interfaces (APIs) offered by respective social networking services Using these APIs is often time consuming and the number of nodes (e.g., profiles) that can be queried within a given time is restricted. Snowball sampling (Lee et al 2006) and random walk based sampling algorithms (Cooper et al 2016) can be used when the information about the complete network is not accessible Such algorithms suffer from the same drawbacks as of heuristic algorithms; they do not adapt as the observed information updates. Avrachenkov et al (2014) propose Maximum Expected Uncovered Degree (MEUD), a greedy algorithm for selecting which node to be probed This algorithm requires the degree distribution of the original network to be known. We use MOD as a baseline algorithm in our experiments and show that our proposed algorithm significantly outperforms MOD in synthetic and real-world networks

Objectives
Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call