With the goal of understanding if the information contained in node metadata can help in the task of link weight prediction, we investigate herein whether incorporating it as a similarity feature (referred to as metadata similarity) between end nodes of a link improves the prediction accuracy of common supervised machine learning methods. In contrast with previous works, instead of normalizing the link weights, we treat them as count variables representing the number of interactions between end nodes, as this is a natural representation for many datasets in the literature. In this preliminary study, we find no significant evidence that metadata similarity improved the prediction accuracy of the four empirical datasets studied. To further explore the role of node metadata in weight prediction, we synthesized weights to analyze the extreme case where the weights depend solely on the metadata of the end nodes, while encoding different relationships between them using logical operators in the generation process. Under these conditions, the random forest method performed significantly better than other methods in 99.07% of cases, though the prediction accuracy was significantly degraded for the methods analyzed in comparison to the experiments with the original weights.
Read full abstract