Abstract

We consider the prediction of future research collaborations as a link prediction problem applied on a scientific knowledge graph. To the best of our knowledge, this is the first work on the prediction of future research collaborations that combines structural and textual information of a scientific knowledge graph through a purposeful integration of graph algorithms and natural language processing techniques. Our work: (i) investigates whether the integration of unstructured textual data into a single knowledge graph affects the performance of a link prediction model, (ii) studies the effect of previously proposed graph kernels based approaches on the performance of an ML model, as far as the link prediction problem is concerned, and (iii) proposes a three-phase pipeline that enables the exploitation of structural and textual information, as well as of pre-trained word embeddings. We benchmark the proposed approach against classical link prediction algorithms using accuracy, recall, and precision as our performance metrics. Finally, we empirically test our approach through various feature combinations with respect to the link prediction problem. Our experimentations with the new COVID-19 Open Research Dataset demonstrate a significant improvement of the abovementioned performance metrics in the prediction of future research collaborations.

Highlights

  • The development of knowledge graph-based approaches for predicting future research collaborations has received increasing attention in recent years [1,2]

  • We investigate whether the integration of unstructured textual data into a single knowledge graph affects the performance of a link prediction model

  • We study the effect of previously proposed graph kernels-based approaches on the performance of an Machine Learning (ML) model, as far as the link prediction problem is concerned

Read more

Summary

Introduction

The development of knowledge graph-based approaches for predicting future research collaborations has received increasing attention in recent years [1,2]. The majority of the existing knowledge graph based approaches builds on concepts and methods from graph theory to infer knowledge that is not explicitly provided, exploiting the structural characteristics of the corresponding research graph [4] These approaches ignore the extremely useful (but unstructured) textual data that are in most cases available in documents such as scientific articles and reports [5]; as a consequence, they are not able to meaningfully incorporate both structural and textual information into their knowledge graph. To the best of our knowledge, this is the first work on the prediction of future research collaborations that combines structural and textual information of a scientific knowledge graph through a purposeful integration of graph algorithms and natural language processing (NLP) techniques.

Graph Related Concepts
Graph Measures and Indices
Graph Kernels
Pyramid Match Graph Kernel
Propagation Kernel
Graph-Based Text Representations
Word Embeddings
Predicting Future Research Collaborations
The Proposed Approach
Feature Extraction
Link Prediction
Experimental Evaluation
Evaluation Metrics
The CORD-19 Dataset
Generation of Datasets for Predicting Future Research Collaborations
Snapshots
Baseline Feature Combinations
Evaluation Results
Concluding Remarks
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call