Создание виртуальных кластеров Apache Spark в облачных средах с использованием систем оркестрации

O Borisenko,R Pastukhov,S Kuznetsov

doi:10.15514/ispras-2016-28(6)-8

Abstract

Apache Spark is a framework providing fast computations on Big Data using MapReduce model. With cloud environments Big Data processing becomes more flexible since they allow to create virtual clusters on-demand. One of the most powerful open-source cloud environments is Openstack. The main goal of this project is to provide an ability to create virtual clusters with Apache Spark and other Big Data tools in Openstack. There exist three approaches to do it. The first one is to use Openstack REST APIs to create instances and then deploy the environment. This approach is used by Apache Spark core team to create clusters in propriatary Amazon EC2 cloud. Almost the same method has been implemented for Openstack environments. Although since Openstack API changes frequently this solution is deprecated since Kilo release. The second approach is to integrate virtual clusters creation as a built-in service for Openstack. ISP RAS has provided several patches implementing universal Spark Job engine for Openstack Sahara and Openstack Swift integration with Apache Spark as a drop-in replacement for Apache Hadoop. This approach allows to use Spark clusters as a service in PaaS service model. Since Openstack releases are less frequent than Apache Spark this approach may be not convenient for developers using the latest releases. The third solution implemented uses Ansible for orchestration purposes. We implement the solution in loosely coupled way and provide an ability to add any auxiliary tool or even to use another cloud environment. Also, it provides an ability to choose any Apache Spark and Apache Hadoop versions to deploy in virtual clusters. All the listed approaches are available under Apache 2.0 license.

Highlights

The first one is to use Openstack REST APIs to create instances and deploy the environment. This approach is used by Apache Spark core team to create clusters in propriatary Amazon EC2 cloud
Almost the same method has been implemented for Openstack environments
Since Openstack API changes frequently this solution is deprecated since Kilo release

Summary

Введение

Проект Apache Spark [1] является одной из наиболее развитых и производительных [2] реализаций подхода Map-Reduce [3]. В то же время данный проект является частью инфраструктуры Apache Software Foundation для обработки больших данных и обладает возможностями совместной работы с другими проектами этой инфраструктуры (такими как Apache Hadoop [4], YARN [5], Mesos [6], Ignite [7] и другими). Однако настройка каждого из решений в распределенном окружении является очень трудоемкой задачей, требующей глубокого понимания принципов работы и точек взаимодействия каждой из систем. Настройка взаимодействия инструментов анализа больших данных в облачной среде в ручном режиме является экономически неоправданной, поскольку облачные среды предоставляют ресурсы по запросу с оплатой за время использования ресурса. При таком подходе ручная настройка виртуального кластера обладает сразу несколькими недостатками: во-первых, пользователь системы должен оплачивать фактический простой вычислительных ресурсов; причем чем больше вычислительный кластер, тем дольше происходит настройка без средств автоматизации процесса. В качестве облачной среды выбран проект Openstack [8] как наиболее динамически развивающийся и предоставляющий наиболее широкий спектр возможностей среди аналогов

Построение решения

Сравнение решений

Достигнутые результаты

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Proceedings of the Institute for System Programming of the RAS	Publication Date: Jan 1, 2016
Citations: 1	License type: cc-by

R Discovery Prime

R Discovery Prime

Создание виртуальных кластеров Apache Spark в облачных средах с использованием систем оркестрации

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Proceedings of the Institute for System Programming of the RAS

Lead the way for us

Similar Papers

A review on big data based parallel and distributed approaches of pattern mining
Sunil Kumar ... Krishna Kumar Mohbey
Journal of King Saud University - Computer and Information Sciences | VOL. 34
Sunil Kumar, et. al.Sunil Kumar ... Krishna Kumar Mohbey
17 Sep 2019
Journal of King Saud University - Computer and Information Sciences | VOL. 34

Big Data Tools and Techniques: A Roadmap for Predictive Analytics
Ms Ritu Ratra ... Dr Preeti Gulia
International Journal of Engineering and Advanced Technology | VOL. 9
Ms Ritu Ratra, et. al.Ms Ritu Ratra ... Dr Preeti Gulia
30 Dec 2020
International Journal of Engineering and Advanced Technology | VOL. 9

Performance comparison of Apache Hadoop and Apache Spark
Amritpal Singh ... Ashish Kr Luhach
-
Amritpal Singh, et. al.Amritpal Singh ... Ashish Kr Luhach
15 Jun 2019
15 Jun 2019

Big Data Tools: A Survey
...
-
, et. al. ...
14 Jan 2021
14 Jan 2021

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Создание виртуальных кластеров Apache Spark в облачных средах с использованием систем оркестрации

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Proceedings of the Institute for System Programming of the RAS