Abstract

The topic of data integration from external data sources or independent IT-systems has received increasing attention recently in IT departments as well as at management level, in particular concerning data integration in federated database systems. An example of the latter are commercial research information systems (RIS), which regularly import, cleanse, transform and prepare the analysis research information of the institutions of a variety of databases. In addition, all these so-called steps must be provided in a secured quality. As several internal and external data sources are loaded for integration into the RIS, ensuring information quality is becoming increasingly challenging for the research institutions. Before the research information is transferred to a RIS, it must be checked and cleaned up. An important factor for successful or competent data integration is therefore always the data quality. The removal of data errors (such as duplicates and harmonization of the data structure, inconsistent data and outdated data, etc.) are essential tasks of data integration using extract, transform, and load (ETL) processes. Data is extracted from the source systems, transformed and loaded into the RIS. At this point conflicts between different data sources are controlled and solved, as well as data quality issues during data integration are eliminated. Against this background, our paper presents the process of data transformation in the context of RIS which gains an overview of the quality of research information in an institution’s internal and external data sources during its integration into RIS. In addition, the question of how to control and improve the quality issues during the integration process in RIS will be addressed.

Highlights

  • In recent years, there has been a new trend in which universities, research institutions and researchers capture, integrate, store and analyze their research information into a research information system (RIS)

  • Further definitions of the term research information systems (RIS) in the literature can be found in the related papers [4,5,6]

  • Detailed information on measuring data quality in RIS can be found in the Research information from an institution usually needs to be available for different application related paper [1]

Read more

Summary

Introduction

There has been a new trend in which universities, research institutions and researchers capture, integrate, store and analyze their research information into a research information system (RIS). 2. Causes of Data Quality Issues in the Context of Integrating Research Information. With an increase in various data sources, systems and interfaces in the research management process, the likelihood of data quality issues increases as well. Detailed information on these problems and the solution can be found in the related paper [19] Another common cause of data inconsistency in RIS is that the research information exists in different data models and structures and is collected independently. In a RIS, research information from all the relevant data sources and information systems (this level contains, for example, databases from the administration) of an institution is brought together and kept in a uniform schema.

Integrating
Key Treatment
Conversion of Encodings
Unification of Strings and Dates
Separation and Combination of Attribute Values
Calculation of Derived Values
Aggregation
Example
Conclusions
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.