Generalized union and project operations for pooling uncertain and imprecise information

D.A Bell,J.W Guan,Suk Kyoon Lee

doi:10.1016/0169-023x(95)00029-r

Abstract

We have previously proposed an extended relational data model with the objective of supporting uncertain information in a consistent and coherent manner. The model, which can represent both uncertainty and imprecision in data, is based on the Dempster-Shafer (D-S) theory of evidence, and it uses bel and pls functions of the theory, with their definitions extended somewhat for this purpose. Relational operations such as Select, Cartesian Product, Join, Project, Intersect, and Union have previously been defined [21]. In this paper we consider two data combination problems associated with the data model. These problems are believed to be inherent in most database models which handle uncertain information. The problems are: the potential existence in the database of identical tuples which have different respective degrees of belief (the redundancy problem), and the potential existence of different tuples with the same key values (the inconsistency problem). The redundancy problem was treated to some extent in an earlier paper, but the inconsistency problem has not been considered at all yet. Now the well-known orthogonal sum operation in the D-S theory, which performs the pooling of data for the purpose of making choices between hypotheses, may be viewed as a means of reducing inconsistency in data arising from different sources. This capability has not yet been exploited in our data model. So the idea here is to define a new combine operation as a primitive for handling inconsistency in relations. When data from a number of sources is being pooled — often in order to support decision making — the Union operation, and the Project operation, are very important. We are particularly interested in the case where tuples in operand relations match attribute-wise, but have different uncertainty and imprecision characteristics. The execution of both the Union and Project operations, which the new combine operation can help solve, is a means of dealing with the problem of information aggregation. We use the orthogonal sum, which generalizes results from traditional probability theory in a natural and correct manner, for pooling evidence during the combine computation. The paper also addresses the execution efficiency of our suggested approach. The orthogonal sum operation is exponentially complex if implemented naively. A linear time algorithm can readily be made available for Union and Project for the simple case where the attribute values to be combined are singletons (i.e., atomic values — as in the conventional relational model). However, many potential applications of the approach can exploit the new data model's facility of supporting set-valued attributes. In the method presented here we can combine data supporting non-singleton subsets in linear-time.

Full Text