Active semantic caching to optimize multidimensional data analysis in parallel and distributed environments

Henrique Andrade,Tahsin Kurc,Alan Sussman,Joel Saltz

doi:10.1016/j.parco.2007.03.001

Abstract

In this paper, we present a multi-query optimization framework based on the concept of active semantic caching. The framework permits the identification and transparent reuse of data and computation in the presence of multiple queries (or query batches) that specify user-defined operators and aggregations originating from scientific data-analysis applications. We show how query scheduling techniques, coupled with intelligent cache replacement policies, can further improve the performance of query processing by leveraging the active semantic caching operators. We also propose a methodology for functionally decomposing complex queries in terms of primitives so that multiple reuse sites are exposed to the query optimizer, to increase the amount of reuse. The optimization framework and the database system implemented with it are designed to be efficient irrespective of the underlying parallel and/or distributed machine configuration. We present experimental results highlighting the performance improvements obtained by our methods using real scientific data-analysis applications on multiple parallel and distributed processing configurations (e.g., single symmetric multiprocessor (SMP) machine, cluster of SMP nodes, and a Grid computing configuration).

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Active semantic caching to optimize multidimensional data analysis in parallel and distributed environments

Abstract

Talk to us

Similar Papers

More From: Parallel Computing

Lead the way for us

Journal: Parallel Computing	Publication Date: May 1, 2007
Citations: 16

Similar Papers

Parallel processing of spatial batch-queries using $${\text {xBR}}^+$$-trees in solid-state drives
George Roumelis ... Yannis Manolopoulos
Cluster Computing | VOL. 23
George Roumelis, et. al.George Roumelis ... Yannis Manolopoulos
09 Nov 2019
Cluster Computing | VOL. 23

On Efficient Processing of Multiple KNN Queries in Constrained Spatial Networks
Chongsheng Zhang ... Bo Qu
-
Chongsheng Zhang, et. al.Chongsheng Zhang ... Bo Qu
01 Oct 2008
01 Oct 2008

Reinforcement-Learning-Based Query Optimization in Differentially Private IoT Data Publishing
Yili Jiang ... Liang Zhou
IEEE Internet of Things Journal | VOL. 8
Yili Jiang, et. al.Yili Jiang ... Liang Zhou
20 Jan 2021
IEEE Internet of Things Journal | VOL. 8

High-fidelity Simulation-based Optimum Design Utilizing Computing Grid Technology
...
-
, et. al. ...
18 Apr 2005
18 Apr 2005

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Active semantic caching to optimize multidimensional data analysis in parallel and distributed environments

Abstract

Talk to us

Similar Papers

More From: Parallel Computing