Cloud computing and big data: Technologies and applications

Mohamed Essaaidi,Mostapha Zbakh,Pierre Manneback,Mohamed Bakhouya

doi:10.1002/cpe.4517

Abstract

Cloud computing has emerged as a new paradigm of computing, in which scalable and virtualized resources are dynamically provided (anytime, everywhere, and in a transparent way) as services over the Internet.1–3 In cloud computing environments, users can use anytime a variety of devices like PCs, laptops, smart phones, and PDAs to access programs, storage, and application-development platforms using services offered by cloud computing providers. In fact, users can benefit from the high availability, easy scalability, and low costs in using cloud computing resources (ie, software, infrastructure, and platform).4 For instance, cloud computing offers a collection of IT services referred to as Software-as-a-Service to allow users remotely performing their applications. Infrastructure-as-a-Service refers to computing resources as services, whereas Platform-as-a-Service offer some tools and resources for applications' development, including operating systems. Data-Storage-as-a-Service has also emerged in the past few years to provide users with storage capabilities. Generally, cloud computing infrastructures have emerged to allow managing transparently all hardware/software issues such as job scheduling and resources allocation by hiding all implementations, so users can focus on how to access and use remote resources and services instead of focusing on computing and data storage/access issues. Furthermore, various efforts have been recently dedicated to improve the performances of all services offered by cloud computing environments. However, there are still many issues that need to be tackled, mainly, scalability, availability, security, and privacy.4 Despite these issues, cloud computing will remain an environment that continues to play a considerable role in many existing and emerging application domains.5, 6 In parallel to this progress, big data technologies have been developed and deployed so rapidly and are relying heavily on cloud computing infrastructures for both storage and data processing. In many studies, these technologies are considered among the most remarkable technologies for developing context-driven applications and services in many domains such as transportation, health, and energy.7–9 In other words, big data technologies have been some of the current and future research frontiers. They revolutionize many fields, including business, scientific research, and public administration. However, the high-volume, high-velocity, and/or high-variety of available information require new forms of processing to enable enhanced decision making, insight discovery, and processes' optimization.10, 11 Even though these technologies have been developing very fast over the past few years, we are also facing, in addition to inconsistency and incompleteness, a lot of challenges when handling big data; difficulties mainly lie in real-time data gathering, storage, mining, predictive analytics, and visualization.12 Furthermore, IoT technologies have shown great potential for collecting large amounts of data streams from sensor readings. In fact, a myriad of sensors can be deployed for gathering contextual data that could be integrated with other data such as location, weather data, and social media data. The processing of these heterogeneous data allows the development of context-aware applications and services, which can, for example, provide real-time traffic routing throughout the city, detect, and immediately act on environmental pollution peaks or automatically optimize the logistics chain by allowing instantaneous reactions (eg, via actuators). These data streams have to be first pre-processed locally before sending it to the cloud infrastructures using IoT platforms (eg, ThingSpeak) and via wireless technologies (eg, Wifi and LTE) for storage and processing using machine learning algorithms (eg, deep learning). These heavy processing and high communication factors are, however, serious challenges for the development of large-scale IoT applications over cloud computing environments.7, 9 The aim of this special issue is to present recent contributions and results in the fields of cloud computing, IoT and big data applications, systems architecture and services, virtualization, security and privacy, high-performance computing, and applications, with an emphasis on how to build cloud computing platforms with real impacts. This special issue includes articles that address the state-of-the-art in cloud computing and big data technologies. A total of thirteen papers presented at the 2016 International Conference on Cloud Computing and Applications (CloudTech'16)13 were extended and submitted to this special issue. CloudTech'16 was held in Marrakech from 24 to 26 May 2016, and it is one of the most successful events in the field of cloud computing and big data technologies. CloudTech addresses topics related to cloud computing technologies and applications such as architecture, network protocols, storage, processing, and security. It also addresses topics related to big data such as storage, processing, and applications. Moreover, 135 papers were submitted to CloudTech'16 from 12 countries, from which 49 papers were accepted and presented. Authors of the thirteen best presented papers were invited to submit an extended version of their manuscripts to this special issue. All papers were reviewed as regular submission to the CCPE journal. After four reviewing rounds, ten of them were accepted for publication in this special issue and are briefly described on the rest of this section. Task or job scheduling in cloud computing infrastructures remains an issue in order to satisfy a large number of demands in a reasonable time while optimizing the resources usage. In this context, Ebadifard and Babamir,14 in their paper “A PSO-based task scheduling algorithm improved using a load balancing technique for the cloud computing environment,” introduced a static task scheduling method based on the particle swarm optimization (PSO) algorithm where tasks are assumed to be non-preemptive and independent. Results showed that their proposed PSO method outperforms the round robin task scheduling. In fact, simulation results show that their proposed method converges faster to the near optimal solution than the basic PSO algorithm with a 22% increase in resource utilization and a 33% decrease of makespan when compared with the basic PSO algorithm. As a second contribution, Kun et al,15 in their article “Optimization of Stream-based Live Data Migration Strategy in the Cloud,” introduced a real-time live data migration strategy using PSO. Authors first introduced a nonlinear migration cost model and an imbalance model as metrics to evaluate data migration strategies. They have proposed a loop context and particle grouping as improvement measures. A nested loop context structure was proposed as a feedback in order to improve stream processing framework, while particle grouping was proposed to speed up the convergence rate of PSO. Finally, they rebuild a stream processing framework to implement these methods in order to show the performance of live data migration strategies as well as the instantaneity of migration. Reliability is an important issue in large-scale distributed platforms. Therefore, it is necessary to carry out efficient fault-tolerant mechanisms. In this direction, Stavrinides and Karatza,16 in their article “The impact of checkpointing interval selection on the scheduling performance of real-time fine-grained parallel applications in SaaS clouds under various failure probabilities,” tackled this issue by investigating the impact of checkpointing interval selection on the performance of a SaaS cloud. Mainly, authors investigated via simulations the impact of checkpointing interval selection on the scheduling performance of real-time fine-grained parallel applications having firm deadlines and approximate computations together with various failure probabilities. Simulation results showed that, for higher failure probabilities and smaller service times, checkpointing should be more frequent in order to achieve good performance. Results also showed that the selected value should be above a particular threshold as unnecessary frequent checkpointing may lead to performance degradation. Kraemer et al,17 in their article “Reducing the number of response time Service Level Objective violations by a cloud-HPC convergence scheduler,” introduced a scheduling strategy for migrating jobs from the cloud environment to the HPC environment. The main aim is to reduce the number of response time violations of cloud jobs without interfering with HPC job execution. Authors evaluated the proposed job scheduling strategy in different execution scenarios using the SimGrid simulator, and reported results showed no response time violations. As a fifth contribution in this research area, Mahmoudi et al,18 in their article “Towards a Smart Selection of Resources in the Cloud for Low-energy Multimedia Processing,” presented an image processing framework, which is composed of software kernels and a strategy to select the most suitable hardware target for execution in a multi-CPU/multi-GPU cloud-based platform according to users' workload (eg, single or multiple images, multiple videos, video in real time). Experiments have been conducted, and reported results showed that applications could be integrated in an adapted way in order to reduce both computing time and energy consumption. Authors also stated that the platform could be used to exploit the shared applications without extra overhead; mainly, it eliminates the possibility for users to download, install, and configure the corresponding software and hardware. Privacy and security in cloud computing are among the most important issues that still need to be addressed especially for emerging IoT and big data applications. As a contribution in this field, Furfaro et al,19 in their article “Cybersecurity Compliance Analysis as a Service: Requirements Specification and Application Scenarios,” highlighted some specific requirements, which have to be taken into account when modeling a cloud service for cybersecurity compliance analysis (CCA). They have mainly adopted recently proposed requirements methodology, called GOReM (goal-oriented requirements methodology), to support the conceptualization and subsequent implementation of CCA services. Using two different application scenarios, authors show the importance of including and controlling those requirements coming from rules and regulations governing the external/internal worlds of the considered context for every specific country where the cloud service has to be provided. In both scenarios, GOReM allows the grasping and understanding of many and complex issues regarding secure cloud services that have to be provided to worldwide clients, including legal aspects, which have to be considered by service providers. As a second contribution, Braeken and Touhafi,20 in their article “Autonomous Anonymous User Authentication and its Application in V2G,” introduced two user-friendly authentication protocols able to derive the required security material at user's side without the need of a secure channel between users and the registration center. These protocols allow the avoidance of separate out-of-band channels during the registration process in order to establish the required security material. Furthermore, no secret key material needs to be shared between the server and the registration center. The required computational efforts to complete the security operations are reasonable compared with previously proposed systems. Finally, authors applied both protocols in the context of V2G domain. The first protocol allows secure monitoring, and the second protocol enables the secure charging and discharging of electrical vehicles with smart grid. New techniques and applications are developed and experimented for cloud computing. For example, Touhafi et al,21 in their article “CoderLabs: A Cloud-based platform for Real-Time Online Labs with User Collaboration,” introduced a web-based remote lab composer, which allows the interconnection and data exchange between remote laboratories. In fact, users are able to use simultaneously and in collaboration manner remote labs. In this platform, authors show that it is possible to use Google Coder to develop, change, or create a user interface for a remote experiment to be shared in the cloud. It consists mainly of a drag and drop lab composer, which allows lab-developers to use standard widgets, data-visualization tools, and data-ports to compose complex remote labs. A lab-composer engine was also developed to automate the coupling of the physical instances and collect the data for possible visualization. Another contribution in this direction, Lachhab et al,22 in their article “Performance Evaluation of Linked Stream Data Processing Engines for Situational Awareness Applications,” highlighted the importance of complex event processing (CEP) engines for the gathering and real-time analysis of data streams. Authors first compared and evaluated the performance of three CEP engines, CQELS, C-SPARQL, and ETALIS, widely used by researchers for linked stream data processing. This evaluation study was made using two existing benchmarks, CityBench and SP2Bench. Results show the efficiency and scalability of these CEP engines for both social-based data (SP2Bench) and physical-based data (CityBench) and that ETALIS outperforms CQELS and C-SPARQL in terms of throughput and memory utilization. Finally, they have introduced a platform that integrates ETALIS for situation/context monitoring in energy-efficient buildings, and obtained results showed its usefulness in extracting situations for developing context-driven control approaches. In the same direction, Samadi et al,23 in their article “Performance comparison between Hadoop and Spark frameworks using HiBench benchmarks,” presented and discussed a performance comparison between two popular big data frameworks, Hadoop MapReduce and Apache Spark, which are used for big data processing. Authors have used the HiBench benchmark suite, which is an experimental approach for measuring the effectiveness of any computer system. Mainly, Wordcount workload with different data sizes was used to compute main metrics, the execution time, the throughput, and the speed up. The experimental results showed that the performance of these frameworks varies significantly according to the applications' workloads. In fact, in major cases, Spark outperforms Hadoop especially when dealing with a large amount of data, but it requires, however, higher memory allocation. This special issue provides new results of some research works in the field of cloud computing, IoT, and big data technologies. In particular, manuscripts of this issue focus on job scheduling, resources optimization, privacy and security, and performance evaluation. We hope that the readers will benefit from the research works presented in this special issue and will contribute to these fast growing research areas.24 The guest editors would like to thank all of the authors who submitted their papers to this special issue. We also thank all reviewers for their time and tangible work they have made to successfully complete the reviewing process. We also sincerely thank the editor in chief of this journal, Prof. G. Fox for the opportunity of having this special issue, his assistance during its preparation process, and for giving the authors the opportunity to publish their works in the International Journal of Concurrency and Computation: Practice and Experience. Many thanks also to the CCPE Editorial Board and the journal's staff for their support. Finally, we thank the following Editorial Committee members for professional and timely reviews: Munir Kashif (Saudi Arabia), Zine-Dine Khalid (Morocco), Mahmoudi Sidi Ahmed (Belgium), J. Garcia Blas Francisco (Spain), Margalef Tomas (Spain), Durillo Juan (Austria), Marzolla Moreno (Italy), Stavrinides Georgios (Greece), Ma Kun (China), Karatza Helen (Greece), Trystram Denis (France), Tadonki Claude (France), Saad Sultan (Portugal), Petcu Dana (Romany), Braeken An (Belgium), Trystram Denis (France), and Takahashi Takeshi (Japan).

Full Text