Performance analysis model for big data applications in cloud computing

Luis Eduardo Bautista Villalpando,Alain April,Alain Abran

doi:10.1186/s13677-014-0019-z

Abstract

The foundation of Cloud Computing is sharing computing resources dynamically allocated and released per demand with minimal management effort. Most of the time, computing resources such as processors, memory and storage are allocated through commodity hardware virtualization, which distinguish cloud computing from others technologies. One of the objectives of this technology is processing and storing very large amounts of data, which are also referred to as Big Data. Sometimes, anomalies and defects found in the Cloud platforms affect the performance of Big Data Applications resulting in degradation of the Cloud performance. One of the challenges in Big Data is how to analyze the performance of Big Data Applications in order to determine the main factors that affect the quality of them. The performance analysis results are very important because they help to detect the source of the degradation of the applications as well as Cloud. Furthermore, such results can be used in future resource planning stages, at the time of design of Service Level Agreements or simply to improve the applications. This paper proposes a performance analysis model for Big Data Applications, which integrates software quality concepts from ISO 25010. The main goal of this work is to fill the gap that exists between quantitative (numerical) representation of quality concepts of software engineering and the measurement of performance of Big Data Applications. For this, it is proposed the use of statistical methods to establish relationships between extracted performance measures from Big Data Applications, Cloud Computing platforms and the software engineering quality concepts.

Highlights

IntroductionAccording to ISO subcommittee 38, the CC study group, Cloud Computing (CC) is a paradigm for enabling ubiquitous, convenient, on-demand network access to a shared pool of configurable cloud resources accessed through services which can be rapidly provisioned and released with minimal management effort or service provider interaction [1].One of the challenges in CC is how to process and store large amounts of data ( known as Big Data ? BD) in an efficient and reliable way
According to ISO subcommittee 38, the CC study group, Cloud Computing (CC) is a paradigm for enabling ubiquitous, convenient, on-demand network access to a shared pool of configurable cloud resources accessed through services which can be rapidly provisioned and released with minimal management effort or service provider interaction [1].One of the challenges in CC is how to process and store large amounts of data in an efficient and reliable way
This paper presents the conclusions of our research, which proposes a performance analysis model for big applications ? Performance analysis models (PAM) for Big Data Applications (BDA)

Summary

Introduction

According to ISO subcommittee 38, the CC study group, Cloud Computing (CC) is a paradigm for enabling ubiquitous, convenient, on-demand network access to a shared pool of configurable cloud resources accessed through services which can be rapidly provisioned and released with minimal management effort or service provider interaction [1].One of the challenges in CC is how to process and store large amounts of data ( known as Big Data ? BD) in an efficient and reliable way. Hadoop is divided into several sub projects that fall under the umbrella of infrastructures for distributed computing One of these sub projects is MapReduce, which is a programming model with an associated implementation, both developed by Google for processing and generating large datasets. Authors like Lin [14] point out that today, the issue of tackling large amounts of data is addressed by a divide-and-conquer approach, the basic idea being to partition a large problem into smaller sub problems. Those sub problems can be handled in parallel by different workers; for example, threads in a processor core, cores in a multi-core processor, multiple processors in a machine, or many machines in a cluster. The intermediate results of each individual worker are combined to yield the final output

Objectives

Results

Conclusion