Today's database systems must deal with uncertainty in the data they store. Consequently, there is a strong need for mining probabilistic databases. Because probabilistic data in first normal form relations is redundant, existing mining techniques are inadequate for discovering probabilistic databases. This paper designs a new strategy for identifying potentially useful patterns in probabilistic databases. A dependent rule is thus identified in a probabilistic database, represented in the form X → Y with conditional probability matrix MY|X . This method uses an instance selection to increase efficiency, enabling us to reduce the search space. We evaluated the proposed technique, and our experimental results demonstrate that the approach is effective and efficient.
Read full abstract