DLBench+: A benchmark for quantitative and qualitative data lake assessment

Pegdwendé N Sawadogo,Jérôme Darmont

doi:10.1016/j.datak.2023.102154

Abstract

In the last few years, the concept of data lake has become trendy for data storage and analysis. Thus, several approaches have been proposed to build data lake systems. However, such proposals are difficult to evaluate as there are no commonly shared criteria for comparing data lake systems. Thus, we introduce in this paper DLBench+, a benchmark to evaluate and compare data lake implementations that support textual and/or tabular contents. More concretely, we propose a data model made of both textual and CSV documents, a workload model composed of a set of various tasks, as well as a set of performance-based metrics, all relevant to the context of data lakes. Beyond a purely quantitative assessment, we also propose a methodology to qualitatively evaluate data lake systems through the assessment of user experience. As a proof of concept, we use DLBench+ to evaluate an open source data lake system we developed.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

DLBench+: A benchmark for quantitative and qualitative data lake assessment

Abstract

Talk to us

Similar Papers

More From: Data & Knowledge Engineering

Lead the way for us

Journal: Data & Knowledge Engineering	Publication Date: Feb 21, 2023
Citations: 2

Similar Papers

Benchmarking Data Lakes Featuring Structured and Unstructured Data with DLBench
Pegdwendé N Sawadogo ... Jérôme Darmont
-
Pegdwendé N Sawadogo, et. al.Pegdwendé N Sawadogo ... Jérôme Darmont
01 Jan 2020
01 Jan 2020

Leveraging the Data Lake: Current State and Challenges
Corinna Giebler ... Christoph Gröger
-
Corinna Giebler, et. al.Corinna Giebler ... Christoph Gröger
01 Jan 2019
01 Jan 2019

A Zone-Based Data Lake Architecture for IoT, Small and Big Data
Yan Zhao ... Franck Ravat
-
Yan Zhao, et. al.Yan Zhao ... Franck Ravat
14 Jul 2021
14 Jul 2021

The concept of an intelligent data lake management system: machine consciousness and a universal data model
Artem A Sukhobokov ... Alyona K Tsvetkova
Procedia Computer Science | VOL. 213
Artem A Sukhobokov, et. al.Artem A Sukhobokov ... Alyona K Tsvetkova
01 Jan 2021
Procedia Computer Science | VOL. 213

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

DLBench+: A benchmark for quantitative and qualitative data lake assessment

Abstract

Talk to us

Similar Papers

More From: Data &amp; Knowledge Engineering

More From: Data & Knowledge Engineering