Abstract
The cultural world offers a staggering amount of rich and varied metadata on cultural heritage, accumulated by governmental, academic, and commercial players. However, the variety of involved institutions means that the data are stored in as many complex and often incompatible models and standards, which limits its availability and explorability by the greater public. The adoption of Linked Open Data technologies allows a strong interlinking of these various databases as well as external connections with existing knowledge bases. However, as they often contain references to the same entities, the delicate issue of entity alignment becomes the central challenge, especially in the absence or scarcity of unique global identifiers. To tackle this issue, we explored two approaches, one based on a set of heuristic rules and one based on masked language models, or masked language models (MLMs). We compare these two approaches, as well as different variations of MLMs, including some models trained on a different language, and various levels of data cleaning and labeling. Our results show that heuristics are a solid approach but also that MLM-based entity alignment obtains better performance coupled with the fact that it is robust to the data format and does not require any form of data preprocessing, which was not the case of the heuristic approach in our experiments.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.