Abstract

Data uncertainty is inherent in emerging applications such as location-based services, sensor monitoring systems, and data integration. To handle a large amount of impre- cise information, uncertain databases have been recently developed. In this paper, we study how to efficiently discover frequent itemsets from large uncertain databases, interpreted under the Possible World Semantics. This is technically challenging, since an uncertain data- base induces an exponential number of possible worlds. To tackle this problem, we propose a novel methods to capture the itemset mining process as a probability distribution func- tion taking two models into account: the Poisson distribution and the normal distribution. These model-based approaches extract frequent itemsets with a high degree of accuracy and

Highlights

  • In many applications, the underlying databases are uncertain

  • We propose an efficient approximate probabilistic frequent itemset mining solution using specific models to capture the frequentness of an itemset

  • Model-based algorithms can significantly improve the performance of Probabilistic frequent itemsets (PFI) discovery, with a high degree of accuracy

Read more

Summary

Introduction

Data collected from sensors like temperature and humidity are noisy [1]. As captured in supermarket basket databases, contain statistical information for predicting what a customer will buy in the future [4,39]. Integration and record linkage tools associate confidence values to the output tuples according to the quality of matching [14]. Confidence values are appended to rules for extracting patterns from unstructured data [40]. Uncertain databases have been proposed to offer a better support for handling imprecise data in these applications [10,14,21,23,30].1

Objectives
Results
Discussion
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call