A continuum limit for the PageRank algorithm

A Yuan,J Calder,B Osting

doi:10.1017/s0956792521000097

Abstract

Semi-supervised and unsupervised machine learning methods often rely on graphs to model data, prompting research on how theoretical properties of operators on graphs are leveraged in learning problems. While most of the existing literature focuses on undirected graphs, directed graphs are very important in practice, giving models for physical, biological or transportation networks, among many other applications. In this paper, we propose a new framework for rigorously studying continuum limits of learning algorithms on directed graphs. We use the new framework to study the PageRank algorithm and show how it can be interpreted as a numerical scheme on a directed graph involving a type of normalisedgraph Laplacian. We show that the corresponding continuum limit problem, which is taken as the number of webpages grows to infinity, is a second-order, possibly degenerate, elliptic equation that contains reaction, diffusion and advection terms. We prove that the numerical scheme is consistent and stable and compute explicit rates of convergence of the discrete solution to the solution of the continuum limit partial differential equation. We give applications to proving stability and asymptotic regularity of the PageRank vector. Finally, we illustrate our results with numerical experiments and explore an application to data depth.

Highlights

Due to its versatility in modelling data, graphs are frequently leveraged for applications in machine learning and data science
Our main results are finite sample size error estimates with high probability, which imply convergence in the continuum, but are stronger in that they hold in the non-asymptotic regime. We use these results to prove stability of the PageRank problem, and we study the time-dependent version of the problem, which examines the continuum limit of the distribution of the random surfer
6 Conclusion In this paper, we established a new framework for rigorously studying continuum limits for discrete learning problems on directed graphs

Summary

Introduction

Due to its versatility in modelling data, graphs are frequently leveraged for applications in machine learning and data science. For the linear 2-graph Laplacian, [26] used the maximum principle to establish discrete to continuum convergence rates for regression problems, and [15] used the maximum principle in combination with random walk arguments to establish convergence rates for semi-supervised learning at low labelling rates. Continuum limits allow us to prove stability of graph-based algorithms, showing that they are insensitive to the particular realisation of the data, and often can lead to new formulations of learning problems founded on stronger theoretical principles. Our main results are finite sample size error estimates with high probability, which imply convergence in the continuum, but are stronger in that they hold in the non-asymptotic regime We use these results to prove stability of the PageRank problem, and we study the time-dependent version of the problem, which examines the continuum limit of the distribution of the random surfer. We present the results of some numerical experiments confirming our theoretical results and exploring applications to data depth

Random directed geometric graph

Main results

Outline

Consistency for Ln

Convergence proofs

Numerical experiments

Convergence rates and parameter scalings

PageRank for data depth

Conclusion

Full Text

Published version (

Free)

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: European Journal of Applied Mathematics	Publication Date: Apr 27, 2021
Citations: 8	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

A continuum limit for the PageRank algorithm

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: European Journal of Applied Mathematics

Lead the way for us

Similar Papers

Abstract 2449: Unsupervised machine learning methods reveal metabolomic based clusters in breast cancer patients
Jocelyn Gal ... Lun Jing
American Journal of Cancer | VOL. 79
Jocelyn Gal, et. al.Jocelyn Gal ... Lun Jing
01 Jul 2019
Abstract 2449: Unsupervised machine learning methods reveal metabolomic based clusters in breast cancer patients
Jocelyn Gal ... Lun Jing

Necessary and Sufficient Graphical Conditions for Affine Formation Control
Zhiyun Lin ... Lili Wang
IRE Transactions on Automatic Control | VOL. 61
Zhiyun Lin, et. al.Zhiyun Lin ... Lili Wang
01 Oct 2016
IRE Transactions on Automatic Control | VOL. 61

Survival analysis of patient groups defined by unsupervised machine learning clustering methods based on patient metabolomic data.
Caroline Bailleux ... Jocelyn Gal
Computational and structural biotechnology journal | VOL. 21
Caroline Bailleux, et. al.Caroline Bailleux ... Jocelyn Gal
01 Jan 2023
Computational and structural biotechnology journal | VOL. 21

Survey on Technique and User Profiling in Unsupervised Machine Learning Method
Andri M Kristijansson ... Tyr Aegisson
Journal of Machine and Computing | VOL. -
Andri M Kristijansson, et. al.Andri M Kristijansson ... Tyr Aegisson
05 Jan 2022
Journal of Machine and Computing | VOL. -

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

A continuum limit for the PageRank algorithm

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: European Journal of Applied Mathematics