Abstract

In recent years, data has become uncertain due to the flourishing advanced technologies that participate continuously and increasingly in producing large amounts of incomplete data. Often, many modern applications where uncertainty occurs are distributed in nature, e.g., distributed sensor networks, information extraction, data integration, social network, etc. Consequently, even though the data uncertainty has been studied in the past for centralized behavior, it is still a challenging issue to manage uncertainty over the data in situ. In this paper, we propose a framework to managing uncertain categorical data over distributed environments that is built upon a hierarchical indexing technique based on inverted index, and a distributed algorithm to efficiently process queries on uncertain data in distributed environment. Leveraging this indexing technique, we address two kinds of queries on the distributed uncertain databases 1) a distributed probabilistic thresholds query, where its answers satisfy the probabilistic threshold requirement; and 2) a distributed top-k-queries, optimizing, the transfer of the tuples from the distributed sources to the coordinator site and the time treatment. Extensive experiments are conducted to verify the effectiveness and efficiency of the proposed method in terms of communication costs and response time.

Highlights

  • In recent years, data has become uncertain due to the flourishing of advanced technologies that participate continuously and increasingly in producing large amounts of incomplete data, data with missing values and uncertain data

  • We address the problem of indexing and query processing on uncertain categorical data in distributed environments

  • In order to address these drawbaks, we propose an approach that use a Local Uncertain Index (LUI) for uncertain data on each local site, while a Global Uncertain Index (GUI) is used to summarizing the local indexes

Read more

Summary

INTRODUCTION

Data has become uncertain due to the flourishing of advanced technologies that participate continuously and increasingly in producing large amounts of incomplete data, data with missing values and uncertain data. Many efforts have been devoted to studying uncertain databases These efforts yield different approaches and algorithms for modeling and representing uncertain data [1,2,3], indexing techniques and query processing over uncertain data [3,4,5,6,7,8,9]. Notable exceptions include recent work on indexing and query processing of distributed uncertain data [10,11,12,13]. These works have only considered top-k queries on uncertain real-valued attributes. We propose an original approach that efficiently answers queries on distributed uncertain data with minimum communication and processing costs.

RELATED WORK
Data Model
PROPOSED FRAMEWORK
DISTRIBUTED UNCERTAIN INDEXING
Global Uncertain Index Structure
Local Uncertain Index Structure
DISTRIBUTED UNCERTAIN QUERY PROCESSING
Distributed Uncertain Top-k Algorithm
Efficiency of DUTh
EXPERIMENTAL STUDY
Scalability of DUTh
Efficiency of DUTk
Effectiveness of DUTk
Scalability of DUTk
Findings
VIII. CONCLUSION

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.