A semi-automatic data integration process of heterogeneous databases

Marcello Barbella,Genoveffa Tortora

doi:10.1016/j.patrec.2023.01.007

Marcello Barbella, Genoveffa Tortora

Open Access

https://doi.org/10.1016/j.patrec.2023.01.007

Copy DOI

Journal: Pattern recognition letters	Publication Date: Jan 14, 2023
Citations: 6	License type: cc-by-nc-nd

Affiliation: University of Salerno

Abstract

One of the most difficult issues today, is the integration of data from various sources. Thus, it arises the need of automatic Data Integration (DI) methods. However, in the literature there are fully automatic or semi-automatic DI techniques, but they require the involvement of IT-experts with specific domain skills. In this paper we present a novel DI methodology for which it is not required the involvement of IT-experts; in this methodology syntactically/semantically similar entities present in the sources are merged, by exploiting an information retrieval technique, a clustering method and a trained neural network. Although the suggested process is completely automated, we planned some interactions with the Company Manager, a figure who is not required to have IT-skills, but whose only contribution will be to define limits and tolerance thresholds during the DI process, based on the interests of the company. The validity of the proposed approach showed an integration accuracy between 99%−100%.

Full Text