Different knowledge graphs for the same domain are often uniquely housed on the Web. Effectively linking entities from different graphs is critical for building an open and comprehensive knowledge graph. However, linking entities across different sources has thus far faced various challenges, including the increasingly large-scale volume of the data, the heterogeneity of the graphs, and the ambiguity of real-world entities. To address them, we propose a unified framework LinKG. Specifically, we decouple the problem into different linking tasks based on the unique properties of each type of entity. To link word sequence based entities, we propose an LSTM-based method to capture word dependencies. To link entities of large scale, we utilize the hashing technique and convolutional neural networks for scalable and accurate linking. To link ambiguous entities, we propose heterogeneous graph attention networks to leverage heterogeneous structural information. Finally, to validate the design choices of different LinKG modules, we characterize the relationships between different tasks based on the single-domain and multi-domain transfer models. Extensive experiments demonstrate the effectiveness of LinKG with an overall F1-score of 95.15%, based on which we deploy and release the Open Academic Graph (OAG)—the largest publicly available heterogeneous academic graph to date.
Read full abstract