Goal-based composition of scalable hybrid analytics for heterogeneous architectures

P Coetzee,S.A Jarvis

doi:10.1016/j.jpdc.2016.11.009

Abstract

Crafting scalable analytics in order to extract actionable business intelligence is a challenging endeavour, requiring multiple layers of expertise and experience. Often, this expertise is irreconcilably split between an organisation’s engineers and subject matter domain experts. Previous approaches to this problem have relied on technically adept users with tool-specific training.Such an approach has a number of challenges: Expertise — There are few data-analytic subject domain experts with in-depth technical knowledge of compute architectures; Performance — Analysts do not generally make full use of the performance and scalability capabilities of the underlying architectures; Heterogeneity — calculating the most performant and scalable mix of real-time (on-line) and batch (off-line) analytics in a problem domain is difficult; Tools — Supporting frameworks will often direct several tasks, including, composition, planning, code generation, validation, performance tuning and analysis, but do not typically provide end-to-end solutions embedding all of these activities.In this paper, we present a novel semi-automated approach to the composition, planning, code generation and performance tuning of scalable hybrid analytics, using a semantically rich type system which requires little programming expertise from the user. This approach is the first of its kind to permit domain experts with little or no technical expertise to assemble complex and scalable analytics, for hybrid on- and off-line analytic environments, with no additional requirement for low-level engineering support.This paper describes (i) an abstract model of analytic assembly and execution, (ii) goal-based planning and (iii) code generation for hybrid on- and off-line analytics. An implementation, through a system which we call Mendeleev, is used to (iv) demonstrate the applicability of this technique through a series of case studies, where a single interface is used to create analytics that can be run simultaneously over on- and off-line environments. Finally, we (v) analyse the performance of the planner, and (vi) show that the performance of Mendeleev’s generated code is comparable with that of hand-written analytics.

Highlights

This paper presents a new approach to this problem, in providing a framework in which domain experts can compose and deploy efficient and scalable hybrid analytics without prior engineering knowledge
The remainder of this paper is structured as follows: Section 2 describes related work; Section 3 outlines the high-level approach adopted in this research and the implications of design choices; Sections 4 and 5 detail our approach to modelling analytics and planning their execution respectively; Section 6 describes the process of efficient code generation; Section 7 illustrates the application of this approach through four case studies; Sections 8 and 9 provide a performance evaluation of this framework and conclude the paper
It is important to note that the creation of this knowledge-base is beyond the scope of this research: it is assumed that engineers in organisations with a need for an analytic planning system are willing to undertake the manual annotation of the processing elements (PEs) they make available to their users

Summary

Introduction

Parallel architectures and engineering scalable systems, and the domain expert understands detailed semantics of their data and appropriate queries on that data. If user data is being crawled, for example, a streaming (on-line) analytic engine such as Apache Storm [2] or IBM InfoSphere Streams [27] might be employed for subset A, while person data in subset B might reside in an HDFS (Hadoop Distributed File System) [32] data store Each of these runtime environments specify their own programming model, optimisation constraints and engineering best practices. This complexity is increased when constructing a hybrid analytic which makes use of data from multiple runtimes: should subset C of this Flickr analytic be executed in an on- or offline runtime environment, and which configuration would be most performant and scalable?. The remainder of this paper is structured as follows: Section 2 describes related work; Section 3 outlines the high-level approach adopted in this research and the implications of design choices; Sections 4 and 5 detail our approach to modelling analytics and planning their execution respectively; Section 6 describes the process of efficient code generation; Section 7 illustrates the application of this approach through four case studies; Sections 8 and 9 provide a performance evaluation of this framework and conclude the paper

Related work

High-Level overview

Methodology

Impact of design choices

Modelling analytics

PE formalism

PE model abstraction

Goal-based planning

Type closure

Conditions

Code generation

DSL code generation

Native code generation

Integrating complex analytics

Case studies

Case study

Performance evaluation

PE Used

Runtime performance

30 Latency Time

Conclusions & further work

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Journal of Parallel and Distributed Computing	Publication Date: Dec 7, 2016
Citations: 3	License type: cc-by

R Discovery Prime

R Discovery Prime

Goal-based composition of scalable hybrid analytics for heterogeneous architectures

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Journal of Parallel and Distributed Computing

Lead the way for us

Similar Papers

Goal-Based Analytic Composition for On-and Off-line Execution at Scale
...
-
, et. al. ...
20 Aug 2015
20 Aug 2015

Development of a Mobile Game to Influence Behavior Determinants of HIV Service Uptake Among Key Populations in the Philippines: User-Centered Design Process.
Charlotte Hemingway ... Tyrone Reden Sy
JMIR Serious Games | VOL. 7
Charlotte Hemingway, et. al.Charlotte Hemingway ... Tyrone Reden Sy
20 Dec 2019
JMIR Serious Games | VOL. 7

User trust in social networking services: A comparison of Facebook and LinkedIn
Shuchih Ernest Chang ... Wei Cheng Shen
Computers in Human Behavior | VOL. 69
Shuchih Ernest Chang, et. al.Shuchih Ernest Chang ... Wei Cheng Shen
19 Dec 2016
Computers in Human Behavior | VOL. 69

From classroom tutor to hypertext adviser: an evaluation
Michael Kemp ... Wendy Hall
-
Michael Kemp, et. al.Michael Kemp ... Wendy Hall
01 Sep 2002
01 Sep 2002

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Goal-based composition of scalable hybrid analytics for heterogeneous architectures

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Journal of Parallel and Distributed Computing