Node-degree aware edge sampling mitigates inflated classification performance in biomedical random walk-based graph representation learning.

Luca Cappelletti,Guy Karlebach,Tudor Oprea,Justin Reese,Giorgio Valentini,Elena Casiraghi,J Harry Caufield,Lauren Rekerle,Peter N Robinson,Ben Coleman,Vida Ravanmehr,Leigh Carmody,Tommaso Fontana,Peter Hansen,Christopher J Mungall,Leonard Spranger,Jeremy Yang

doi:10.1093/bioadv/vbae036

Abstract

Graph representation learning is a family of related approaches that learn low-dimensional vector representations of nodes and other graph elements called embeddings. Embeddings approximate characteristics of the graph and can be used for a variety of machine-learning tasks such as novel edge prediction. For many biomedical applications, partial knowledge exists about positive edges that represent relationships between pairs of entities, but little to no knowledge is available about negative edges that represent the explicit lack of a relationship between two nodes. For this reason, classification procedures are forced to assume that the vast majority of unlabeled edges are negative. Existing approaches to sampling negative edges for training and evaluating classifiers do so by uniformly sampling pairs of nodes. We show here that this sampling strategy typically leads to sets of positive and negative examples with imbalanced node degree distributions. Using representative heterogeneous biomedical knowledge graph and random walk-based graph machine learning, we show that this strategy substantially impacts classification performance. If users of graph machine-learning models apply the models to prioritize examples that are drawn from approximately the same distribution as the positive examples are, then performance of models as estimated in the validation phase may be artificially inflated. We present a degree-aware node sampling approach that mitigates this effect and is simple to implement. Our code and data are publicly available at https://github.com/monarch-initiative/negativeExampleSelection.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Bioinformatics advances	Publication Date: Jan 5, 2024
Citations: 1	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

Node-degree aware edge sampling mitigates inflated classification performance in biomedical random walk-based graph representation learning.

Abstract

Talk to us

Similar Papers

More From: Bioinformatics advances

Lead the way for us

Similar Papers

Personalized Ranking in Signed Networks Using Signed Random Walk with Restart
Jinhong Jung ... Woojeong Jin
-
Jinhong Jung, et. al.Jinhong Jung ... Woojeong Jin
01 Dec 2016
01 Dec 2016

A Novel Algorithm to Compute Stable Groups in Signed Social Networks
Lakshmi Satya Vani Narayanam ... Satish V Motammanavar
-
Lakshmi Satya Vani Narayanam, et. al.Lakshmi Satya Vani Narayanam ... Satish V Motammanavar
01 Jan 2019
01 Jan 2019

Random walk-based ranking in signed social networks: model and algorithms
Jinhong Jung ... Woojeong Jin
Knowledge and Information Systems | VOL. 62
Jinhong Jung, et. al.Jinhong Jung ... Woojeong Jin
06 May 2019
Knowledge and Information Systems | VOL. 62

Inhomogeneous Models for Random Graphs and Spreading Processes: Applications in Wireless Sensor Networks and Social Networks

-

02 Oct 2019
02 Oct 2019

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Node-degree aware edge sampling mitigates inflated classification performance in biomedical random walk-based graph representation learning.

Abstract

Talk to us

Similar Papers

More From: Bioinformatics advances