Abstract
Traditional Data Warehouse is a multidimensional repository. It is nonvolatile, subject-oriented, integrated, time-variant, and non-operational data. It is gathered from multiple heterogeneous data sources. We need to adapt traditional Data Warehouse architecture to deal with the new challenges imposed by the abundance of data and the current big data characteristics, containing volume, value, variety, validity, volatility, visualization, variability, and venue. The new architecture also needs to handle existing drawbacks, including availability, scalability, and consequently query performance. This paper introduces a novel Data Warehouse architecture, named Lake Data Warehouse Architecture, to provide the traditional Data Warehouse with the capabilities to overcome the challenges. Lake Data Warehouse Architecture depends on merging the traditional Data Warehouse architecture with big data technologies, like the Hadoop framework and Apache Spark. It provides a hybrid solution in a complementary way. The main advantage of the proposed architecture is that it integrates the current features in traditional Data Warehouses and big data features acquired through integrating the traditional Data Warehouse with Hadoop and Spark ecosystems. Furthermore, it is tailored to handle a tremendous volume of data while maintaining availability, reliability, and scalability.
Highlights
Data warehouse (DW) has many benefits; it enhances Business Intelligence, data quality, and consistency, saves time, and supports historical data analysis and querying [1]
It adds additional features and capabilities that facilitate working with big data technologies and tools (Hadoop, Data Lake, Delta Lake, and Apache Spark) in a complementary way to support and enhance existing architecture
It is an extra storage layer that makes reliability to our data lakes built on The Hadoop Distributed File System (HDFS) and cloud storage [31]
Summary
Data warehouse (DW) has many benefits; it enhances Business Intelligence, data quality, and consistency, saves time, and supports historical data analysis and querying [1]. In the age of big data with the massive increase in the data volume and types, there is a great need to apply more adequate architectures and technologies to deal with it. We propose a new DW architecture called Lake Data Warehouse Architecture. Lake Data Warehouse Architecture is a hybrid system that preserves the traditional DW features. It adds additional features and capabilities that facilitate working with big data technologies and tools (Hadoop, Data Lake, Delta Lake, and Apache Spark) in a complementary way to support and enhance existing architecture. Our proposed contribution solve several issues that face integrating data from big data repositories such as: Integrating traditional DW technique, Hadoop Framework, and Apache Spark..
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
More From: International Journal of Advanced Computer Science and Applications
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.