Adaptive learning of aggregate analytics under dynamic workloads

Fotis Savva,Christos Anagnostopoulos,Peter Triantafillou

doi:10.1016/j.future.2020.03.063

Fotis Savva, Christos Anagnostopoulos + Show 1 more

Open Access

https://doi.org/10.1016/j.future.2020.03.063

Copy DOI

Abstract

Large organizations have seamlessly incorporated data-driven decision making in their operations. However, as data volumes increase, expensive big data infrastructures are called to rescue. In this setting, analytics tasks become very costly in terms of query response time, resource consumption, and money in cloud deployments, especially when base data are stored across geographically distributed data centers. Therefore, we introduce an adaptive, reciprocity-based Machine Learning mechanism which is light-weight, stored client-side, can estimate the answers of a variety of aggregate queries and can avoid the big data back-end. The estimations are performed in milliseconds are inexpensive and accurate as the mechanism learns from past analytical-query patterns. However, as analytic queries are ad hoc and analysts’ interests change over time we develop solutions that can swiftly and accurately detect such changes and adapt to new query patterns. The capabilities of our approach are demonstrated using extensive evaluation with real and synthetic datasets.

Highlights

With the rapid explosion of data volumes and the adoption of data-driven decision making, organizations have been struggling to process data efficiently
We introduce a Change Detection Mechanism (CDM) and an Adaptation Mechanism (ADM) addressing this concern raising a number of challenges: (1) How to detect a query pattern change; we need to enable triggers that alert the mechanism being in prediction mode in case of a concept drift; (2) What kind of action should we take in case that happens, i.e., what strategy to follow for updating the Machine Learning (ML) models; (3)
In this work we contribute a novel framework for adapting trained models under concept drift of the underlying query workload distribution

Summary

Introduction

With the rapid explosion of data volumes and the adoption of data-driven decision making, organizations have been struggling to process data efficiently. An Analyst’s Device (AD) and adaptive to dynamic query workloads This allows the exploratory process to be executed locally at ADs providing predictions to aggregate queries not overburdening the Cloud/Central System (CS). Our system offers a learning-based, prediction-driven way of performing aggregate analytics in ADs accessing no data. It neither requires data transmission from CS to ADs nor from ADs to CS. Comprehensive assessment of the system performance and sensitivity analysis using real and synthetic data and query workloads

Fundamentals of query-driven learning

Query space clustering

Change detection mechanism

Query pattern change detection

Model adaptation

Taking advantage of affiliates

Convergence to an offline mode

Is there a single ML model that can be used for this purpose?

Predictability

Adaptivity

Related work

Conclusions & future work

Declaration of competing interest

Findings

Synthetic data generation

Full Text

Published version (

Free)

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Future generations computer systems : FGCS	Publication Date: Apr 7, 2020
Citations: 6	License type: cc-by

R Discovery Prime

R Discovery Prime

Adaptive learning of aggregate analytics under dynamic workloads

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Future generations computer systems : FGCS

Lead the way for us

Similar Papers

Aggregate Query Prediction under Dynamic Workloads
Fotis Savva ... Christos Anagnostopoulos
-
Fotis Savva, et. al.Fotis Savva ... Christos Anagnostopoulos
01 Dec 2019
01 Dec 2019

Real and synthetic data sets for benchmarking key-value stores focusing on various data types and sizes
Hyuk-Yoon Kwon
Data in Brief | VOL. 30
Hyuk-Yoon KwonHyuk-Yoon Kwon
20 Mar 2020
Data in Brief | VOL. 30

MDDM: A Method to Improve Multiple Dimension Data Management Performance in HBase
Zhuang Wei ... Zhu Chaoqiang
-
Zhuang Wei, et. al.Zhuang Wei ... Zhu Chaoqiang
01 Aug 2015
01 Aug 2015

Design and technical validation to generate a synthetic 12-lead electrocardiogram dataset to promote artificial intelligence research
Hakje Yoo ... Hyung Joon Joo
Health information science and systems | VOL. 11
Hakje Yoo, et. al.Hakje Yoo ... Hyung Joon Joo
30 Aug 2023
Health information science and systems | VOL. 11

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Adaptive learning of aggregate analytics under dynamic workloads

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Future generations computer systems : FGCS