Abstract

Cloud Computing as a disruptive technology, provides a dynamic, elastic and promising computing climate to tackle the challenges of big data processing and analytics. Hadoop and MapReduce are the widely used open source frameworks in Cloud Computing for storing and processing big data in the scalable fashion. Spark is the latest parallel computing engine working together with Hadoop that exceeds MapReduce performance via its in-memory computing and high level programming features. In this paper, we present our design and implementation of a productive, domain-specific big data analytics cloud platform on top of Hadoop and Spark. To increase user’s productivity, we created a variety of data processing templates to simplify the programming efforts. We have conducted experiments for its productivity and performance with a few basic but representative data processing algorithms in the petroleum industry. Geophysicists can use the platform to productively design and implement scalable seismic data processing algorithms without handling the details of data management and the complexity of parallelism. The Cloud platform generates a complete data processing application based on user’s kernel program and simple configurations, allocates resources and executes it in parallel on top of Spark and Hadoop.

Highlights

  • Cloud computing as a disruptive technology, provides a dynamic, elastic and easy-to-use computing climate to tackle the challenges of big data processing and analytics

  • Three different services cloud can provide in this regard, which are categorized as Software as a Service (SaaS), Platform as a Service (PaaS), and Infrastructure as a Service (IaaS) [1]

  • SeismicVolume class provides functions for constructing Resilient Distributed Datasets (RDDs) based on processing templates user had selected, applying user’s algorithms on RDD, and storing the final RDD on Hadoop File System (HDFS) with format defined by user

Read more

Summary

Introduction

Cloud computing as a disruptive technology, provides a dynamic, elastic and easy-to-use computing climate to tackle the challenges of big data processing and analytics. A cloud-based big data analytics platform is becoming important to support their daily work by delivering the required storage space, processing power, and intelligent analytics capacity in many industries, such as retails, energy, oil & gas, security/surveillance, image/video, social networks, financial/trading, and more. One challenge these industries are facing in common is the fast-growing data volume. We studied the oil & gas industry requirements for the domain data processing and analytics, and designed a domain-specific big data processing and analytics cloud for the industry.

Apache Hadoop
Apache Spark
Seismic Analytics Cloud Implementation
The Architecture of Seismic Analytics Cloud
Input Data and Redirection
Code Generation
Driver and Job Executor
Experiment and Results
SAC Web UI
Seismic Calculator
Histogram
Performance Analysis
Usability Analysis
Performance Analysis of Seismic Calculator
Performance Analysis of FFT
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call