Building a Productive Domain-Specific Cloud for Big Data Processing and Analytics Service

Yuzhong Yan,Lei Huang,Mahsa Hanifi,Liqi Yi

doi:10.4236/jcc.2015.35014

Yuzhong Yan, Lei Huang + Show 2 more

Open Access

https://doi.org/10.4236/jcc.2015.35014

Copy DOI

Journal: Journal of Computer and Communications	Publication Date: Jan 1, 2015
Citations: 11	License type: CC BY 4.0

Affiliation: Intel (Taiwan)

Abstract

Cloud Computing as a disruptive technology, provides a dynamic, elastic and promising computing climate to tackle the challenges of big data processing and analytics. Hadoop and MapReduce are the widely used open source frameworks in Cloud Computing for storing and processing big data in the scalable fashion. Spark is the latest parallel computing engine working together with Hadoop that exceeds MapReduce performance via its in-memory computing and high level programming features. In this paper, we present our design and implementation of a productive, domain-specific big data analytics cloud platform on top of Hadoop and Spark. To increase user’s productivity, we created a variety of data processing templates to simplify the programming efforts. We have conducted experiments for its productivity and performance with a few basic but representative data processing algorithms in the petroleum industry. Geophysicists can use the platform to productively design and implement scalable seismic data processing algorithms without handling the details of data management and the complexity of parallelism. The Cloud platform generates a complete data processing application based on user’s kernel program and simple configurations, allocates resources and executes it in parallel on top of Spark and Hadoop.

Highlights

Cloud computing as a disruptive technology, provides a dynamic, elastic and easy-to-use computing climate to tackle the challenges of big data processing and analytics
Three different services cloud can provide in this regard, which are categorized as Software as a Service (SaaS), Platform as a Service (PaaS), and Infrastructure as a Service (IaaS) [1]
SeismicVolume class provides functions for constructing Resilient Distributed Datasets (RDDs) based on processing templates user had selected, applying user’s algorithms on RDD, and storing the final RDD on Hadoop File System (HDFS) with format defined by user

Summary

Introduction

Cloud computing as a disruptive technology, provides a dynamic, elastic and easy-to-use computing climate to tackle the challenges of big data processing and analytics. A cloud-based big data analytics platform is becoming important to support their daily work by delivering the required storage space, processing power, and intelligent analytics capacity in many industries, such as retails, energy, oil & gas, security/surveillance, image/video, social networks, financial/trading, and more. One challenge these industries are facing in common is the fast-growing data volume. We studied the oil & gas industry requirements for the domain data processing and analytics, and designed a domain-specific big data processing and analytics cloud for the industry.

Apache Hadoop

Apache Spark

Seismic Analytics Cloud Implementation

The Architecture of Seismic Analytics Cloud

Input Data and Redirection

Code Generation

Driver and Job Executor

Experiment and Results

SAC Web UI

Seismic Calculator

Histogram

Performance Analysis

Usability Analysis

Performance Analysis of Seismic Calculator

Performance Analysis of FFT

Full Text

Published version (

Free)

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Building a Productive Domain-Specific Cloud for Big Data Processing and Analytics Service

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Journal of Computer and Communications

Lead the way for us

Similar Papers

A Study of Big Data Processing for Sentiments Analysis
Dinesh Chander ... Hari Singh
-
Dinesh Chander, et. al.Dinesh Chander ... Hari Singh
28 Jul 2020
28 Jul 2020

A Study of Big Data Processing for Sentiments Analysis
Dinesh Chander ... Abhinav Kirti Gupta
-
Dinesh Chander, et. al.Dinesh Chander ... Abhinav Kirti Gupta
01 Jan 2021
01 Jan 2021

Intelligent Oilfield - Cloud Based Big Data Service in Upstream Oil and Gas
Oladele Bello ... Derek Bale
-
Oladele Bello, et. al.Oladele Bello ... Derek Bale
22 Mar 2019
22 Mar 2019

Cloud computing and big data: Technologies and applications
Mostapha Zbakh ... Mohamed Essaaidi
Concurrency and computation : practice & experience | VOL. 29
Mostapha Zbakh, et. al.Mostapha Zbakh ... Mohamed Essaaidi
29 Mar 2017
Concurrency and computation : practice & experience | VOL. 29

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Building a Productive Domain-Specific Cloud for Big Data Processing and Analytics Service

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Journal of Computer and Communications