Abstract

Ensuring the quality of integrated data is undoubtedly one of the main problems of integrated data systems. When focusing on multi-national and historical data integration systems, where the “space” and “time” dimensions play a relevant role, it is very much important to build the integration layer in such a way that the final user accesses a layer that is “by design” as much complete as possible. In this paper, we propose a method for accessing data in multipurpose data infrastructures, like data integration systems, which has the properties of (i) relieving the final user from the need to access single data sources while, at the same time, (ii) ensuring to maximize the amount of the information available for the user at the integration layer. Our approach is based on a completeness-aware integration approach which allows the user to have ready available all the maximum information that can get out of the integrated data system without having to carry out the preliminary data quality analysis on each of the databases included in the system. Our proposal of providing data quality information at the integrated level extends then the functions of the individual data sources, opening the data infrastructure to additional uses. This may be a first step to move from data infrastructures towards knowledge infrastructures. A case study on the research infrastructure for the science and innovation studies shows the usefulness of the proposed approach.

Highlights

  • In the current big data era in which we live, the problems of data integration, harmonization and above all data quality have increased rather than reduced (Ekbia et al, 2015)

  • We will first give an overview of the used data integration approach, namely ontology-based data management (OBDM); we will focus on our proposal to explicitly represent data quality of the integration layer, so to have a full governance of the quality of the data provided by the data integration system

  • An OBDM system is an information management system maintained and used by a given organization, whose architecture has the same structure of a typical data integration system, with the following components: an Integration Layer with an ontology, a Source Layer with a set of data sources, and the mapping between the two

Read more

Summary

Introduction

In the current big data era in which we live, the problems of data integration, harmonization and above all data quality have increased rather than reduced (Ekbia et al, 2015). Ontology-based data management was introduced about a decade ago as a new way for modeling and interacting with a collection of data sources (see Lenzerini, 2011). According to such paradigm, the client of the information system is freed from being aware of how data are structured in concrete resources (databases, software programs, services, etc.), and interacts with the system by expressing her queries and goals in terms of a conceptual representation of the domain of interest, called ontology. In the general case, such databases are numerous, heterogeneous, each one managed and maintained independently from the others

Objectives
Methods
Findings
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.