Abstract

We survey recent results on modeling and querying probabilistic XML data. The literature contains a plethora of probabilistic XML models [2, 13, 14, 18, 21, 24, 27], and most of them can be represented by means of p-documents [18] that have, in addition to ordinary nodes, distributional nodes that specify the probabilistic process of generating a random document. The above models are families of p-documents that differ in the types of distributional nodes in use. The focus of this survey is on the tradeoff between the ability to express real-world probabilistic data (in particular, by taking correlations between atomic events into account) and the efficiency of query evaluation. We concentrate on two important issues. The first is the ability to efficiently translate a pdocument of one family into that of another. The second is the complexity of query evaluation over pdocuments (under the usual semantics of querying probabilistic data, e.g., [4, 9, 10]). It turns out that efficient evaluation of a large class of queries (i.e., twig patterns with projection and aggregate functions) is realizable in models where distributional nodes are probabilistically independent. In other models, the evaluation of a query with projection is very often intractable. In comparison, very simple conjunctive queries are intractable over probabilistic models of relational databases, even when the tuples are probabilistically independent [9, 10]. To handle the limitation exhibited by the above tradeoff, various approaches have been proposed. The first is to allow query answers to be approximate [18], which makes the evaluation of twig patterns with projection tractable in the most expressive family of p-documents, among those considered. This tractability, however, does not carry over to nonmonotonic queries, such as twig patterns with negation or aggregation. The approach presented in [7]

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call