Many emerging applications need continuous querying over uncertain event streams, mostly for online monitoring. These streaming uncertain events may come from radars, sensors, or even software hooks. The uncertainty is usually due to measurement errors, inherent ambiguities and privacy preserving reasons. To cover new requirements, we have designed and implemented a new system called Probabilistic Data Stream Management System (PDSMS) in Ref. 1. PDSMS is a data processing engine which runs continuous queries over probabilistic streams. However, lack of a semantics for probabilistic databases which supports continuous distributions prevented us from having a strong foundation for our query operators. It also precludes us from proving consistency and correctness of query operations especially after optimization and adaption. In fact, in the probabilistic database literature, there is no semantics available which covers continuous distributions. This limitation is very restrictive as in real-world, uncertainty is usually modeled by continuous distributions. In this paper, after presenting a basic probabilistic data model for PDSMS, we focus on querying and formally present the first semantics for probabilistic query operations which supports continuous distributions as well as discrete ones. Using this new semantics, we define our query operators (e.g. select, project, and join) formally without ambiguity and compatible with operators in relational algebra. Thus, we can leverage many transformation rules in relational algebra as well. This new semantics allows us to have different strictness levels and consistency between operators. We also proved many strictness theorems about different alternatives for query operators.
Read full abstract