Abstract

For more than 30 years, data warehouses (DWs) have attracted particular interest both in practice and in research. This success is explained by their ability to adapt to their evolving environment. One of the last challenges for DWs is their ability to open their frontiers to external data sources in addition to internal sources. The development of linked open data (LOD) as external sources is an excellent opportunity to create added value and enrich the analytical capabilities of DWs. However, the incorporation of LOD in the DW must be accompanied by careful management. In this paper, we are interested in managing the evolution of DW systems integrating internal and external LOD datasets. The particularity of LOD is that they contribute to evolving the DW at several levels: (i) source level, (ii) DW schema level, and (iii) DW design-cycle constructs. In this context, we have to ensure this co-evolution, as conventional evolution approaches are adapted neither to this new kind of source nor to semantic constructs underlying LOD sources. One way of tackling this co-evolution issue is to ensure the traceability of DW constructs for the whole design cycle. Our approach is tested using: the LUBM (Lehigh University BenchMark), different LOD datasets (DBepedia, YAGO, etc.), and Oracle 12c database management system (DBMS) used for the DW deployment.

Highlights

  • Endowing data with semantics is a crucial task for many applications

  • (2) We present an event-based approach for managing the co-evolution of the whole cycle, including a co-evolution mechanism for the annotation of data warehouses (DWs) constructs impacted by evolution events at the semantic level

  • Building a Data warehouses (DW s)with internal and external data published in linked open data (LOD) formats requires managing the evolution of DW s that deal with “open world” sources and their specific characteristics

Read more

Summary

Introduction

Endowing data with semantics is a crucial task for many applications. Motivated by the emergence of ontologies and linked open data (LOD ) in different fields (e.g., Cyc, DBpedia, Freebase, and YAGO), the amount of semantic data is rapidly increasing in various domains (e.g., the New York Times, BBC, and Thomson Reuters semantic data) and their involvement in different data centric systems is growing. Contrary to traditional DW s that manage evolution either from a source perspective or from a requirements perspective, ignoring the interrelated artifacts composing the DW design cycle, LOD can be integrated at different steps of the DW cycle, possibly impacting different artifacts. Such integration scenarios are conducted during the core phases of DW design, namely: requirements definition, the extract–transform–load (ETL) phase composed of a set of ETL processes used to transform heterogeneous data in the DW representation, and the deployment phase which implements the DW system.

DW Design from Internal and External Sources
DW Evolution Management
Co-Evolution Management
DW Traceability Model for Managing Co-Evolution
Evolution Management Approach
Propagating the changes
Applying the changes
Case Study and Experiments
Conclusions
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call