Concentration inequalities for two-sample rank processes with application to bipartite ranking

Stephan Clémençon,Nicolas Vayatis,Myrto Limnios

doi:10.1214/21-ejs1907

Abstract

The ROC curve is the gold standard for measuring the performance of a test/scoring statistic regarding its capacity to discriminate between two statistical populations in a wide variety of applications, ranging from anomaly detection in signal processing to information retrieval, through medical diagnosis. Most practical performance measures used in scoring/ranking applications such as the AUC, the local AUC, the p-norm push, the DCG and others, can be viewed as summaries of the ROC curve. In this paper, the fact that most of these empirical criteria can be expressed as two-sample linear rank statistics is highlighted and concentration inequalities for collections of such random variables, referred to as two-sample rank processes here, are proved, when indexed by VC classes of scoring functions. Based on these nonasymptotic bounds, the generalization capacity of empirical maximizers of a wide class of ranking performance criteria is next investigated from a theoretical perspective. It is also supported by empirical evidence through convincing numerical experiments.

Highlights

We analyze the experimental results, by commenting on the test ROC curves obtained after learning the scoring functions, using the early-stopped version of the Algorithm 1 described above, that maximize the chosen Wφ-performance measure: MWW, Pol and RTB
This article argues that two-sample linear rank statistics provide a very flexible and natural class of empirical performance measures for bipartite ranking
We have showed that it encompasses in particular well-known criteria used in medical diagnosis and information retrieval and proved that, in expectation, these criteria are maximized by optimal scoring functions and put the emphasis on gradient ascent method (GA) algorithm’s optimal parameter for the class of scoring functions

Summary

Motivation and preliminaries

We start with recalling key notions pertaining to ROC analysis and bipartite ranking, which essentially motivates the theoretical analysis carried out in the subsequent section. We recall at length the definition of two-sample linear rank statistics, which have been intensively used to design statistical (homogeneity) testing procedures in the univariate setup, and highlight that many scalar summaries of empirical ROC curves, commonly used as ranking performance criteria, are precisely of this form. The indicator function of any event E is denoted by I{E}, the Dirac mass at any point x by δx, the generalized inverse of any cumulative distribution function W (t) on R ∪ {+∞} by W −1(u) = inf{t ∈] − ∞, +∞] : W (t) ≥ u}, u ∈ [0, 1]. We denote the floor and ceiling functions by u ∈ R → u and by u ∈ R → u respectively

Bipartite ranking and ROC analysis

Two-sample linear rank statistics

Bipartite ranking as maximization of two-sample rank statistics

Concentration inequalities for two-sample rank processes

Performance of maximizers of two-sample rank statistics in bipartite ranking

Generalization error bounds and model selection

Kernel regularization for ranking performance maximization

Numerical experiments

A gradient-based algorithmic approach

Synthetic data generation

Results and discussion

Conclusion

Hajek projection method

U -statistics and U -processes

V C-type classes of functions – permanence properties

Proof of Proposition 4

Permanence properties

Proof of Lemma 17

Proof of Lemma 18

A generalization bound in expectation

Proof of Proposition 8

Proof of Proposition 9

Proof of Lemma 16

Location model

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Concentration inequalities for two-sample rank processes with application to bipartite ranking

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Electronic Journal of Statistics

Lead the way for us

Journal: Electronic Journal of Statistics	Publication Date: Jan 1, 2021
License type: cc-by

Similar Papers

Optimal Production Scheduling using a Production Simulator by Modified Brain Storm Optimization
Kenjiro Takahashi ... Takaomi Sato
-
Kenjiro Takahashi, et. al.Kenjiro Takahashi ... Takaomi Sato
28 Jun 2021
28 Jun 2021

Reconceptualising e‐business performance measurement using an innovation adoption framework
David Barnes ... C. Matthew Hinton
International Journal of Productivity and Performance Management | VOL. 61
David Barnes, et. al.David Barnes ... C. Matthew Hinton
15 Jun 2012
International Journal of Productivity and Performance Management | VOL. 61

Measuring performance of a service system – from organizations to customer-perceived performance
Aki Jääskeläinen ... Sanna Pekkola
Measuring Business Excellence | VOL. 18
Aki Jääskeläinen, et. al.Aki Jääskeläinen ... Sanna Pekkola
12 Aug 2014
Measuring Business Excellence | VOL. 18

On a Characterization of the Normal Distribution from Properties of Suitable Linear Statistics
R G Laha
The Annals of Mathematical Statistics | VOL. 28
R G LahaR G Laha
01 Mar 1957
The Annals of Mathematical Statistics | VOL. 28

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Concentration inequalities for two-sample rank processes with application to bipartite ranking

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Electronic Journal of Statistics