Abstract

Knowledge graphs (KGs) have found extensive applications within intelligent systems, such as information retrieval. Much of the research has predominantly focused on completing missing knowledge, with little consideration given to examining errors. Unfortunately, during customizing KGs, diverse unpredictable errors are virtually unavoidable to be introduced, and these anomalies significantly impact the performance of applications. Detecting erroneous knowledge presents a formidable challenge due to the costly acquisition of ground-truth labels. In this work, we develop an unsupervised anomaly detection framework named ProMvSD, aiming to adapt KGs of varying scales via serialization components. To overcome the insufficient contextual information provided by the topological structure, we introduce the large language model as a reasoner to extract prior knowledge from extensive pre-trained textual data, thereby enhancing the understanding of KGs. Anomalous triple may result in a larger semantic gap between the head and tail neighborhoods. To uncover latent anomalies effectively, we propose a multi-view semantic-driven model (MvSD) based on the assumptions of self-consistency and information stability. MvSD jointly estimates the suspiciousness of triples from three hyperviews: node-view semantic contradiction, triple-view semantic gap, and pathway-view semantic gap. Extensive experiments on three English benchmark KGs and a Chinese medical KG demonstrate that, for the top 1% of the most suspicious triples, we can detect real anomalies with at most 99.9% accuracy. Furthermore, ProMvSD significantly outperforms state-of-the-art representation learning baselines, achieving a 29.2% improvement in detecting all anomalies.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call