Abstract

Data integration is essential to enrich a database with external information. One effective approach is to match shared identifiers across diverse databases. However, a lack of syntactic interoperability, which refers to the ability to match data based on their syntax, can pose challenges. In this paper, we present a novel method to evaluate and enhance syntactic interoperability, considering associated costs. First, we introduce the linking index and completeness index as generic measures of fine-grained syntactic interoperability. Second, we analyze the data consistency level of the identifiers using a rule-based framework for data quality assessment. Third, we propose a data integration strategy that strikes a balance between fixing data inconsistencies and the resulting benefits, as measured by the linking and completeness indices. The approach is illustrated through two use cases: bibliographic databases and clinical trial registries. The results demonstrate that standardizing identifiers’ representations can significantly improve syntactic interoperability in certain scenarios while in others, the standardization process does not yield improvements, discouraging, hence integration decisions. By conducting a cost–benefit analysis of improving data interoperability, this analysis enables data integrators to make informed decisions regarding the feasibility and advantages of proceeding with data integration.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.