An adaptive and real-time based architecture for financial data integration

Noussair Fikri,Noureddine Abghour,Amina El Omri,Mohamed Rida,Khalid Moussaid

doi:10.1186/s40537-019-0260-x

Noussair Fikri, Noureddine Abghour + Show 3 more

Open Access

https://doi.org/10.1186/s40537-019-0260-x

Copy DOI

Journal: Journal of Big Data	Publication Date: Nov 11, 2019
Citations: 17	License type: open-access

Affiliation: University of Hassan II Casablanca

Abstract

In this paper we are proposing an adaptive and real-time approach to resolve real-time financial data integration latency problems and semantic heterogeneity. Due to constraints that we have faced in some projects that requires real-time massive financial data integration and analysis, we decided to follow a new approach by combining a hybrid financial ontology, resilient distributed datasets and real-time discretized stream. We create a real-time data integration pipeline to avoid all problems of classic Extract-Transform-Load tools, which are data processing latency, functional miscomprehensions and metadata heterogeneity. This approach is considered as contribution to enhance reporting quality and availability in short time frames, the reason of the use of Apache Spark. We studied Extract-Transform-Load (ETL) concepts, data warehousing fundamentals, big data processing technics and oriented containers clustering architecture, in order to replace the classic data integration and analysis process by our new concept resilient distributed DataStream for online analytical process (RDD4OLAP) cubes which are consumed by using Spark SQL or Spark Core basics.

Highlights

The main goal of building an ETL is production data integration
Business intelligence: This architecture has an added value on classic business intelligence (BI) Tools: Real time data acquisition using Apache Kafka, rich programming interface and clustered computing of Apache Spark for transformation and an adaptive metadata models for financial data structures using a hybrid approach of ontologies by combining Financial industry business ontologies (FIBO) and Information system local ontologies (ISLO)
Using a Big data processing approach combined with a hybrid financial ontology is the best approach among existing real-time architectures

Summary

Introduction

The main goal of building an ETL is production data integration. Information systems have a various and large datasets of data following a relational schema in the most cases [1]. Companies use several tools for data preparation before loading into data warehouse, Microsoft SSIS, IBM Cognos, SAP Business Objects, Pentaho Data Integration and other tools are available for this type of tasks. Data warehouse provides a structured and cleansed data for reporting in order to provide metrics which allow firms to make the best decisions in business driving [2]. Financial reporting is based on financial metrics that allow internal and external regulatory and auditing bodies to evaluate the company’s health. These metrics are extracted from financial data warehouse which contains datamarts of financial statements like statement of financial position.

Methods

Results

Discussion

Conclusion