A tool for Internet-scale cardinality estimation of XPath queries over distributed semistructured data

Vasil Slavov,Praveen Rao,Anas Katib

doi:10.1109/icde.2014.6816758

Abstract

We present a novel tool called XGossip for Internet-scale cardinality estimation of XPath queries over distributed XML data. XGossip relies on the principle of gossip, is scalable, decentralized, and can cope with network churn and failures. It employs a novel divide-and-conquer strategy for load balancing and reducing the overall network bandwidth consumption. It has a strong theoretical underpinning and provides provable guarantees on the accuracy of cardinality estimates, the number of messages exchanged, and the total bandwidth usage. In this demonstration, users will experience three engaging scenarios: In the first scenario, they can set up, configure, and deploy XGossip on Amazon Elastic Compute Cloud (EC2). In the second scenario, they can execute XGossip, pose XPath queries, observe in real-time the convergence speed of XGossip, the accuracy of cardinality estimates, the bandwidth usage, and the number of messages exchanged. In the third scenario, they can introduce network churn and failures during the execution of XGossip and observe how these impact the behavior of XGossip.

Full Text