The Berlin Big Data Center (BBDC)

Christoph Boden,Tilmann Rabl,Volker Markl

doi:10.1515/itit-2018-0016

Abstract

Abstract The last decade has been characterized by the collection and availability of unprecedented amounts of data due to rapidly decreasing storage costs and the omnipresence of sensors and data-producing global online-services. In order to process and analyze this data deluge, novel distributed data processing systems resting on the paradigm of data flow such as Apache Hadoop, Apache Spark, or Apache Flink were built and have been scaled to tens of thousands of machines. However, writing efficient implementations of data analysis programs on these systems requires a deep understanding of systems programming, prohibiting large groups of data scientists and analysts from efficiently using this technology. In this article, we present some of the main achievements of the research carried out by the Berlin Big Data Cente (BBDC). We introduce the two domain-specific languages Emma and LARA, which are deeply embedded in Scala and enable declarative specification and the automatic parallelization of data analysis programs, the PEEL Framework for transparent and reproducible benchmark experiments of distributed data processing systems, approaches to foster the interpretability of machine learning models and finally provide an overview of the challenges to be addressed in the second phase of the BBDC.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

The Berlin Big Data Center (BBDC)

Abstract

Talk to us

Similar Papers

More From: it - Information Technology

Lead the way for us

Similar Papers

When to Use a Distributed Dataflow Engine: Evaluating the Performance of Apache Flink
Ilya Verbitskiy ... Lauritz Thamsen
-
Ilya Verbitskiy, et. al.Ilya Verbitskiy ... Lauritz Thamsen
01 Jul 2016
01 Jul 2016

Towards performance and cost-efficiency for data-intensive applications in distributed data processing systems
Danlin Jia
-
Danlin JiaDanlin Jia
10 Feb 2023
10 Feb 2023

Large-scale data mining analytics based on MapReduce

-

01 Jan 2014
01 Jan 2014

Efficient Group K Nearest-Neighbor Spatial Query Processing in Apache Spark
Panagiotis Moutafis ... Antonio Corral
ISPRS International Journal of Geo-Information | VOL. 10
Panagiotis Moutafis, et. al.Panagiotis Moutafis ... Antonio Corral
11 Nov 2021
ISPRS International Journal of Geo-Information | VOL. 10

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

The Berlin Big Data Center (BBDC)

Abstract

Talk to us

Similar Papers

More From: it - Information Technology