Abstract

Since the word2vec model was proposed, many researchers have vectorized the data in the research field based on it. In the field of social network, the Node2Vec model improved on the basis of word2vec can vectorize nodes and edges in social networks, so as to carry out relevant research on social networks, such as link prediction, and community division. However, social network is a network with homogeneous structure. When dealing with heterogeneous networks such as knowledge graph, Node2Vec will lead to inaccurate prediction and unreasonable vector quantization data. Specifically, in the Node2Vec model, the walk strategy for homogeneous networks is not suitable for heterogeneous networks, because the latter has distinguishing features for nodes and edges. In this paper, a Heterogeneous Network vector representation method is proposed based on random walks and Node2Vec, called KG2vec (Heterogeneous Network to Vector) that solves problems related to the inadequate consideration of the full-text semantics and the contextual relations that are encountered by the traditional vector representation of the knowledge graph. First, the knowledge graph is reconstructed and a new random walk strategy is applied. Then, two training models and optimizing strategies are proposed, so that the contextual environment between entities and relations is obtained, semantically providing a full vector representation of the Heterogeneous Network. The experimental results show that the KG2VEC model solves the problem of insufficient context consideration and unsatisfactory results of one-to-many relationship in the vectorization process of the traditional knowledge graph. Our experiments show that KG2vec achieves better performance with higher accuracy than traditional methods.

Highlights

  • Nowadays we reach an era that everything can be embedded, called representation learning

  • KG2Vec instance, in the field of natural language processing (NLP) [1], by embedding the words into the vector representation, we can determine a word’s synonym, or estimate the accuracy of the translation; in the field of bioinformatics, protein chain [2] or transcription factor [3]can be regarded as a network

  • By embedding the proteins into vectors, we can determine whether a chain bond exists; as in social network, by embedding social entities, link prediction can be performed

Read more

Summary

Introduction

Nowadays we reach an era that everything can be embedded, called representation learning. The KG embedding algorithms like TransE [6], TransR [7] and TransG [8] are designed by this main idea These algorithms are proved to be efficient in many scenarios, we notice that the trans-algorithms handle each triple with the same probability, lacking the emphasis as the 2vec models process the vectorization, resulting in unsatisfactory results. As for heterogeneous networks, the 2vec random walks algorithm leading to the problem of inaccuracy embedding. CBOW [10] and Skip-gram [11] are applied in the training process of KG2Vec, and the embedding of relation node and entity node are predicted. Two training models are proposed for heterogeneous networks: given relations, CBOW is used to predict the context entity; given entities, Skip-gram is used to predict the relation node

Heterogeneous network representation learning
Heterogeneous network reconstruction
Walk strategy
Training model
Optimizing random walks
Data set
KG2Vec parameter tuning
Result
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call