Abstract

Large organizations have seamlessly incorporated data-driven decision making in their operations. However, as data volumes increase, expensive big data infrastructures are called to rescue. In this setting, analytics tasks become very costly in terms of query response time, resource consumption, and money in cloud deployments, especially when base data are stored across geographically distributed data centers. Therefore, we introduce an adaptive, reciprocity-based Machine Learning mechanism which is light-weight, stored client-side, can estimate the answers of a variety of aggregate queries and can avoid the big data back-end. The estimations are performed in milliseconds are inexpensive and accurate as the mechanism learns from past analytical-query patterns. However, as analytic queries are ad hoc and analysts’ interests change over time we develop solutions that can swiftly and accurately detect such changes and adapt to new query patterns. The capabilities of our approach are demonstrated using extensive evaluation with real and synthetic datasets.

Highlights

  • With the rapid explosion of data volumes and the adoption of data-driven decision making, organizations have been struggling to process data efficiently

  • We introduce a Change Detection Mechanism (CDM) and an Adaptation Mechanism (ADM) addressing this concern raising a number of challenges: (1) How to detect a query pattern change; we need to enable triggers that alert the mechanism being in prediction mode in case of a concept drift; (2) What kind of action should we take in case that happens, i.e., what strategy to follow for updating the Machine Learning (ML) models; (3)

  • In this work we contribute a novel framework for adapting trained models under concept drift of the underlying query workload distribution

Read more

Summary

Introduction

With the rapid explosion of data volumes and the adoption of data-driven decision making, organizations have been struggling to process data efficiently. An Analyst’s Device (AD) and adaptive to dynamic query workloads This allows the exploratory process to be executed locally at ADs providing predictions to aggregate queries not overburdening the Cloud/Central System (CS). Our system offers a learning-based, prediction-driven way of performing aggregate analytics in ADs accessing no data. It neither requires data transmission from CS to ADs nor from ADs to CS. Comprehensive assessment of the system performance and sensitivity analysis using real and synthetic data and query workloads

Fundamentals of query-driven learning
Query space clustering
Change detection mechanism
Query pattern change detection
Model adaptation
Taking advantage of affiliates
Convergence to an offline mode
Is there a single ML model that can be used for this purpose?
Predictability
Adaptivity
Related work
Conclusions & future work
Declaration of competing interest
Findings
Synthetic data generation
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call