Data-Utility Sensitive Query Processing on Server Clusters to Support Scalable Data Analysis Services

Renwei Yu,Parth Nagarkar,K Selçuk Candan,Jong Wook Kim,Mithila Nagendra

doi:10.1007/978-3-642-19294-4_7

Abstract

AbstractThe observation that a significant class of data processing and analysis applications can be expressed in terms of a small set of primitives that are easy to parallelize has resulted in increasing popularity of batch-oriented, highly-parallelizable cluster frameworks to support data analysis services. These frameworks, however, are known to have shortcomings for certain application domains. For example, in many data analysis applications, the utility of a given data element to the particular analysis task depends on the way the data is collected (e.g. its precision) or interpreted. However, since existing batch oriented data processing frameworks do not consider variations in data utility, they are not able to focus on the best results. Even if the user is interested in obtaining a relatively small subset of the best result instances, these systems often need to enumerate entire result sets, even if these sets contain low-utility results. is an efficient and scalable utility-aware parallel processing system for ranked query processing over large data sets. In this paper, we focus on the data partitioning and work-allocation strategies of for processing top-k join queries to support data analysis services. In particular, we describe how adaptively samples data from “upstream” operators to help allocate resources in a work-balanced and wasted-work avoiding manner for top-k join processing. Experimental results show that the proposed sampling, data partitioning, and join processing strategies enable to return top-k results with high confidence and low-overhead (up to ~9× faster than alternative schemes on 10 servers).KeywordsQuery ProcessingUtility ScoreAdaptive SamplingUtility SpaceUtility ThresholdThese keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Data-Utility Sensitive Query Processing on Server Clusters to Support Scalable Data Analysis Services

Abstract

Talk to us

Similar Papers

Lead the way for us

Similar Papers

Concepts and applications of data mining and analysis of social networks
Azam Hajiaghajani
Journal of Data Analytics | VOL. 2
Azam HajiaghajaniAzam Hajiaghajani
22 Apr 2023
Journal of Data Analytics | VOL. 2

Investigation of Computational Topology for Data Analysis and Visualization Applications
Robin K
Mathematical Statistician and Engineering Applications | VOL. 70
Robin KRobin K
31 Jan 2021
Mathematical Statistician and Engineering Applications | VOL. 70

Benchmarking data analysis and machine learning applications on the Intel KNL many-core processor
Chansup Byun ... David Bestor
-
Chansup Byun, et. al.Chansup Byun ... David Bestor
01 Sep 2017
01 Sep 2017

RanKloud
K Selçuk Candan ... Mithila Nagendra
-
K Selçuk Candan, et. al.K Selçuk Candan ... Mithila Nagendra
21 Mar 2011
21 Mar 2011

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Data-Utility Sensitive Query Processing on Server Clusters to Support Scalable Data Analysis Services

Abstract

Talk to us

Similar Papers