Abstract

The Emerging Infections Network (EIN) (http://ein.idsociety.org/) is a CDC supported “sentinel” network of over 1400 members (currently), designed to connect clinical infectious disease specialists and public health officials. Members primarily communicate through an EIN managed listserv and discuss disease outbreaks, treatment protocols, effectiveness of vaccinations and other disease-control and prevention mechanisms, etc. Recently, researchers at Google and Yahoo! Research have used search engine query logs to tap into the online “wisdom of crowds” and produce disease outbreak trends for flu. Following this work, there is now interest in trying to monitor EIN discussions more carefully to disseminate timely and accurate information on clinical events of possible interest to health officials. We model the problem of monitoring a listserv, such as the EIN, as a type of budgeted maximum coverage problem that we call Budgeted Maximization with Overlapping Costs (BMOC). Even though BMOC seems superficially similar to the budgeted maximum coverage problem considered by Khuller et al. (Inf. Process. Lett., 1999), our problem is fundamentally different from an algorithmic point of view, due to its cost structure. We observe that the greedy algorithm that provides a constant-factor approximation to the budgeted maximum coverage problem can be arbitrarily bad for BMOC. We also present a reduction to BMOC from the k-densest subgraph problem that provides evidence indicating that obtaining a constant-factor approximation for our problem might be quite challenging. Nevertheless, experimental runs of the greedy algorithm on the EIN data show that greedy performs remarkably well relative to OPT. We identify a feature of our EIN data, that we call the overlap condition, and show that the greedy algorithm does indeed yield a constant-factor approximation guarantee if the overlap condition is satisfied. Using an implementation of the greedy algorithm for BMOC on the EIN data, we identify small sets of “bellwether” users who are good predictors of important discussions. We provide evidence to show that tracking just these users reduces the cost of monitoring the EIN significantly without causing any important discussions to be missed.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call