A gossip-based approach for Internet-scale cardinality estimation of XPath queries over distributed semistructured data

Vasil Slavov,Praveen Rao

doi:10.1007/s00778-013-0314-1

Abstract

In this paper, we address the problem of cardinality estimation of XPath queries over XML data stored in a distributed, Internet-scale environment such as a large-scale, data sharing system designed to foster innovations in biomedical and health informatics. The cardinality estimate of XPath expressions is useful in XQuery optimization, designing IR-style relevance ranking schemes, and statistical hypothesis testing. We present a novel gossip algorithm called XGossip, which given an XPath query estimates the number of XML documents in the network that contain a match for the query. XGossip is designed to be scalable, decentralized, and robust to failures--properties that are desirable in a large-scale distributed system. XGossip employs a novel divide-and-conquer strategy for load balancing and reducing the bandwidth consumption. We conduct theoretical analysis of XGossip in terms of accuracy of cardinality estimation, message complexity, and bandwidth consumption. We present a comprehensive performance evaluation of XGossip on Amazon EC2 using a heterogeneous collection of XML documents.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

A gossip-based approach for Internet-scale cardinality estimation of XPath queries over distributed semistructured data

Abstract

Talk to us

Similar Papers

More From: The VLDB Journal

Lead the way for us

Journal: The VLDB Journal	Publication Date: May 17, 2013
Citations: 76

Similar Papers

A tool for Internet-scale cardinality estimation of XPath queries over distributed semistructured data
Vasil Slavov ... Praveen Rao
-
Vasil Slavov, et. al.Vasil Slavov ... Praveen Rao
01 Mar 2014
01 Mar 2014

An analysis on the load balancing strategies in wavelength-routed optical networks
Kai Liu ... Minglei Fu
-
Kai Liu, et. al.Kai Liu ... Minglei Fu
13 Nov 2008
13 Nov 2008

Cluster-based file replication in large-scale distributed systems
Harjinder S Sandhu ... Songnian Zhou
ACM SIGMETRICS Performance Evaluation Review | VOL. 20
Harjinder S Sandhu, et. al.Harjinder S Sandhu ... Songnian Zhou
01 Jun 1992
ACM SIGMETRICS Performance Evaluation Review | VOL. 20

Cluster-based file replication in large-scale distributed systems
Harjinder S Sandhu ... Songnian Zhou
-
Harjinder S Sandhu, et. al.Harjinder S Sandhu ... Songnian Zhou
01 Jun 1992
01 Jun 1992

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

A gossip-based approach for Internet-scale cardinality estimation of XPath queries over distributed semistructured data

Abstract

Talk to us

Similar Papers

More From: The VLDB Journal