Lake Data Warehouse Architecture for Big Data Solutions

Emad Saddad,Ali El-Bastawissy,Hoda M,Maryam Hazman

doi:10.14569/ijacsa.2020.0110854

Emad Saddad, Ali El-Bastawissy + Show 2 more

Open Access

https://doi.org/10.14569/ijacsa.2020.0110854

Copy DOI

Abstract

Traditional Data Warehouse is a multidimensional repository. It is nonvolatile, ‎subject-oriented, integrated, time-variant, and non-‎operational data. It is gathered from multiple ‎heterogeneous data ‎sources. We need to adapt traditional Data Warehouse architecture to deal with the new ‎challenges imposed by the abundance of data and the current big data characteristics, containing ‎volume, value, variety, validity, volatility, visualization, variability, and venue. The new ‎architecture also needs to handle existing drawbacks, including availability, scalability, and ‎consequently query performance. This paper introduces a novel Data Warehouse architecture, named Lake ‎Data Warehouse Architecture, to provide the traditional Data Warehouse with the capabilities to ‎overcome the challenges. ‎Lake Data Warehouse Architecture depends on merging the traditional Data Warehouse architecture ‎with big data technologies, like the Hadoop framework and Apache Spark. It provides a hybrid ‎solution in a complementary way. The main advantage of the proposed architecture is that it ‎integrates the current features in ‎traditional Data Warehouses and big data features acquired ‎through integrating the ‎traditional Data Warehouse with Hadoop and Spark ecosystems. Furthermore, it is ‎tailored to handle a tremendous ‎volume of data while maintaining availability, reliability, and ‎scalability.‎

Highlights

Data warehouse (DW) has many benefits; it enhances Business Intelligence, data quality, and consistency, saves time, and supports historical data analysis and querying [1]
It adds additional features and capabilities that facilitate working with big data technologies and tools (Hadoop, Data Lake, Delta Lake, and Apache Spark) in a complementary way to support and enhance existing architecture
It is an extra storage layer that makes reliability to our data lakes built on The Hadoop Distributed File System (HDFS) and cloud storage [31]

Summary

INTRODUCTION

Data warehouse (DW) has many benefits; it enhances Business Intelligence, data quality, and consistency, saves time, and supports historical data analysis and querying [1]. In the age of big data with the massive increase in the data volume and types, there is a great need to apply more adequate architectures and technologies to deal with it. We propose a new DW architecture called Lake Data Warehouse Architecture. Lake Data Warehouse Architecture is a hybrid system that preserves the traditional DW features. It adds additional features and capabilities that facilitate working with big data technologies and tools (Hadoop, Data Lake, Delta Lake, and Apache Spark) in a complementary way to support and enhance existing architecture. Our proposed contribution solve several issues that face integrating data from big data repositories such as: Integrating traditional DW technique, Hadoop Framework, and Apache Spark.‎.

BACKGROUND

Hadoop Framework and Data Lake

Apache Spark and Delta Lake

RELATED WORKS

THE PROPOSED LAKE DATA WAREHOUSE ARCHITECTURE

Delta Lake architecture with Apache Spark Cloud Environment

Configure Apache Spark

Create a Parquet-based Data Lake Table

Exploring analysis results

CONCLUSION

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: International Journal of Advanced Computer Science and Applications	Publication Date: Jan 1, 2020
Citations: 5	License type: cc-by

R Discovery Prime

R Discovery Prime

Lake Data Warehouse Architecture for Big Data Solutions

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: International Journal of Advanced Computer Science and Applications

Lead the way for us

Similar Papers

On Handling the Evolution of External Data Sources in a Data Warehouse Architecture
Robert Wrembel
-
Robert WrembelRobert Wrembel
01 Jan 2010
01 Jan 2010

Design and Implementation of Enterprise Spatial Data Warehouse
Yin Liang ... Hong Zhang
-
Yin Liang, et. al.Yin Liang ... Hong Zhang
01 Jan 2007
01 Jan 2007

A Comparative Analysis of Traditional and Cloud Data Warehouse
Khawaja Ubaid Ur Rehman ... Umair Ahmad
VAWKUM Transactions on Computer Sciences | VOL. 15
Khawaja Ubaid Ur Rehman, et. al.Khawaja Ubaid Ur Rehman ... Umair Ahmad
30 Mar 2018
VAWKUM Transactions on Computer Sciences | VOL. 15

Data processing in Hive vs. SQL server: A comparative analysis in the query performance
Nadeem Ahmed ... Shakil Ahamed
-
Nadeem Ahmed, et. al.Nadeem Ahmed ... Shakil Ahamed
01 Aug 2017
01 Aug 2017

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Lake Data Warehouse Architecture for Big Data Solutions

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: International Journal of Advanced Computer Science and Applications