Abstract

Bipartite graphs are extensively used to model relationships between two different types of entities. In many real-world bipartite graphs, relationships are naturally uncertain due to various reasons such as data noise, measurement error and imprecision of data, leading to uncertain bipartite graphs. In this paper, we propose the (<tex>$\alpha,\beta,\eta$</tex>)-core model, which is the first cohesive subgraph model on uncertain bipartite graphs. To capture the uncertainty of relationships/edges, <tex>$\eta$</tex>-degree is adopted to measure the vertex engagement level, which is the largest integer <tex>$k$</tex> such that the probability of a vertex having at least <tex>$k$</tex> neighbors is not less than <tex>$\eta$</tex>. Given degree constraints <tex>$\alpha$</tex> and <tex>$\beta$</tex>, and a probability threshold <tex>$\eta$</tex>, the (<tex>$\alpha, \beta, \eta$</tex>)-core requires that each vertex on the upper or lower level have <tex>$\eta$</tex>-degree no less than <tex>$\alpha$</tex> or <tex>$\beta$</tex>, respectively. An (<tex>$\alpha, \beta, \eta$</tex>)-core can be derived by iteratively removing a vertex with <tex>$\eta$</tex>-degree below the degree constraint and updating the <tex>$\eta$</tex>-degrees of its neighbors. This incurs prohibitively high cost due to the <tex>$\eta$</tex>-degree computation and updating, and is not scalable to large bipartite graphs. This motivates us to develop index-based approaches. We propose a basic full index that stores (<tex>$\alpha, \beta, \eta$</tex>)-core for all possible <tex>$\alpha, \beta$</tex>, and <tex>$\eta$</tex> combinations, thus supporting optimal retrieval of the vertices in any (<tex>$\alpha, \beta, \eta$</tex>)-core. Due to its long construction time and high space complexity, we further propose a probability-aware index to achieve a balance between time and space costs. To efficiently build the probability-aware index, we design a bottom-up index construction algorithm and a top-down index construction algorithm. Extensive experiments are conducted on real-world datasets with generated edge probabilities under different distributions, which show that (1) (<tex>$\alpha,\beta,\eta$</tex>)-core is an effective model; (2) index construction and query processing are significantly sped up by the proposed techniques.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call