The traditional model training approach based on negative sampling randomly samples a portion of negative samples for training, which can easily overlook important negative samples and adversely affect the training of knowledge graph embedding models. Some researchers have explored non-sampling model training frameworks that use all unobserved triples as negative samples to improve model training performance. However, both training methods inevitably introduce false negative samples and easy-to-separate negative samples that are far from the model’s decision boundary, and they do not consider the adverse effects of long-tail entities and relations during training, thus limiting the improvement of model training performance. To address this issue, we propose a universal knowledge graph embedding framework based on high-quality negative sampling and weighting, called HNSW-KGE. First, we conduct pre-training based on the NS-KGE non-sampling training framework to quickly obtain an initial set of relatively high-quality embedding vector representations for all entities and relations. Second, we design a candidate negative sample set construction strategy that samples a certain number of negative samples that are neither false negatives nor easy-to-separate negatives for all positive triples, based on the embedding vectors obtained from pre-training. This ensures the provision of high-quality negative samples for model training. Finally, we apply weighting to the loss function based on the frequency of the entities and relations appearing in the triples to mitigate the adverse effects of long-tail entities and relations on model training. Experiments conducted on benchmark datasets FB15K237 and WN18RR using various knowledge graph embedding models demonstrate that our proposed framework HNSW-KGE, based on high-quality negative sampling and weighting, achieves better training performance and exhibits versatility, making it applicable to various types of knowledge embedding models.
Read full abstract