Framework for Managing Uncertain Distributed Categorical Data

Adel Benaissa,Mustapha Yahmi,Yassine Jamil

doi:10.14569/ijacsa.2017.081047

Abstract

In recent years, data has become uncertain due to the flourishing advanced technologies that participate continuously and increasingly in producing large amounts of incomplete data. Often, many modern applications where uncertainty occurs are distributed in nature, e.g., distributed sensor networks, information extraction, data integration, social network, etc. Consequently, even though the data uncertainty has been studied in the past for centralized behavior, it is still a challenging issue to manage uncertainty over the data in situ. In this paper, we propose a framework to managing uncertain categorical data over distributed environments that is built upon a hierarchical indexing technique based on inverted index, and a distributed algorithm to efficiently process queries on uncertain data in distributed environment. Leveraging this indexing technique, we address two kinds of queries on the distributed uncertain databases 1) a distributed probabilistic thresholds query, where its answers satisfy the probabilistic threshold requirement; and 2) a distributed top-k-queries, optimizing, the transfer of the tuples from the distributed sources to the coordinator site and the time treatment. Extensive experiments are conducted to verify the effectiveness and efficiency of the proposed method in terms of communication costs and response time.

Highlights

In recent years, data has become uncertain due to the flourishing of advanced technologies that participate continuously and increasingly in producing large amounts of incomplete data, data with missing values and uncertain data
We address the problem of indexing and query processing on uncertain categorical data in distributed environments
In order to address these drawbaks, we propose an approach that use a Local Uncertain Index (LUI) for uncertain data on each local site, while a Global Uncertain Index (GUI) is used to summarizing the local indexes

Summary

INTRODUCTION

Data has become uncertain due to the flourishing of advanced technologies that participate continuously and increasingly in producing large amounts of incomplete data, data with missing values and uncertain data. Many efforts have been devoted to studying uncertain databases These efforts yield different approaches and algorithms for modeling and representing uncertain data [1,2,3], indexing techniques and query processing over uncertain data [3,4,5,6,7,8,9]. Notable exceptions include recent work on indexing and query processing of distributed uncertain data [10,11,12,13]. These works have only considered top-k queries on uncertain real-valued attributes. We propose an original approach that efficiently answers queries on distributed uncertain data with minimum communication and processing costs.

RELATED WORK

Data Model

PROPOSED FRAMEWORK

DISTRIBUTED UNCERTAIN INDEXING

Global Uncertain Index Structure

Local Uncertain Index Structure

DISTRIBUTED UNCERTAIN QUERY PROCESSING

Distributed Uncertain Top-k Algorithm

Efficiency of DUTh

EXPERIMENTAL STUDY

Scalability of DUTh

Efficiency of DUTk

Effectiveness of DUTk

Scalability of DUTk

Findings

VIII. CONCLUSION

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Framework for Managing Uncertain Distributed Categorical Data

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: International Journal of Advanced Computer Science and Applications

Lead the way for us

Journal: International Journal of Advanced Computer Science and Applications	Publication Date: Jan 1, 2017
License type: cc-by

Similar Papers

Top-k Queries over Distributed Uncertain Categorical Data
Adel Benaissa ... Soror Sahri
-
Adel Benaissa, et. al.Adel Benaissa ... Soror Sahri
01 Jan 2020
01 Jan 2020

EMU: An expectation maximization based approach for clustering uncertain data
Biao Qin ... Jiaqi Ge
Journal of Intelligent & Fuzzy Systems | VOL. 25
Biao Qin, et. al.Biao Qin ... Jiaqi Ge
01 Jan 2013
Journal of Intelligent & Fuzzy Systems | VOL. 25

Integration of Uncertain Data in Geostatistical Modelling
Amílcar Soares ... Leonardo Azevedo
Mathematical Geosciences | VOL. 49
Amílcar Soares, et. al.Amílcar Soares ... Leonardo Azevedo
02 Jan 2017
Mathematical Geosciences | VOL. 49

Rule induction for uncertain data
Biao Qin ... Yuni Xia
Knowledge and Information Systems | VOL. 29
Biao Qin, et. al.Biao Qin ... Yuni Xia
21 Aug 2010
Knowledge and Information Systems | VOL. 29

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Framework for Managing Uncertain Distributed Categorical Data

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: International Journal of Advanced Computer Science and Applications