Towards a new hybrid approach for building document-oriented data warehouses

Nawfal El Moukhi,Ikram El Azami,Soufiane Hajbi

doi:10.11591/ijece.v12i6.pp6423-6431

Nawfal El Moukhi, Ikram El Azami + Show 1 more

Open Access

PDF Available

https://doi.org/10.11591/ijece.v12i6.pp6423-6431

Copy DOI

Export

Save

Cite

Abstract
Full-Text PDF
Similar Papers

Abstract

Listen

<span lang="EN-US">Schemaless databases offer a large storage capacity while guaranteeing high performance in data processing. Unlike relational databases, which are rigid and have shown their limitations in managing large amounts of data. However, the absence of a well-defined schema and structure in not only SQL (NoSQL) databases makes the use of data for decision analysis purposes even more complex and difficult. In this paper, we propose an original approach to build a document-oriented data warehouse from unstructured data. The new approach follows a hybrid paradigm that combines data analysis and user requirements analysis. The first data-driven step exploits the fast and distributed processing of the spark engine to generate a general schema for each collection in the database. The second requirement-driven step consists of analyzing the semantics of the decisional requirements expressed in natural language and mapping them to the schemas of the collections. At the end of the process, a decisional schema is generated in JavaScript object notation (JSON) format and the data loading with the necessary transformations is performed.</span>

Full Text