Abstract

This paper is concerned by statistical inference problems from a data set whose elements may be modeled as random probability measures such as multiple histograms or point clouds. We propose to review recent contributions in statistics on the use of Wasserstein distances and tools from optimal transport to analyse such data. In particular, we highlight the benefits of using the notions of barycenter and geodesic PCA in the Wasserstein space for the purpose of learning the principal modes of geometric variation in a dataset. In this setting, we discuss existing works and we present some research perspectives related to the emerging field of statistical optimal transport.

Highlights

  • This paper is concerned by statistical inference problems from a data set whose elements may be modeled as random probability measures such as multiple histograms or point clouds

  • It has been widely recognized that a significant gain in statistical inference can be achieved by the use of non-Euclidean distances to better capture the geometry of the sets to which the data truly belong

  • This leads to statistical inference problems from multiple points clouds such as those displayed in Figure 2 that can be modeled as discrete probability measures supported on Rd

Read more

Summary

The emerging field of statistical optimal transport

In many fields of interest (e.g. in signal and image processing or bio-informatics), one records data in the form of high-dimensional vectors or matrices. The development of this technology leads to datasets made of multiple measurements (e.g. up to 18) of millions of individuals cells from different subjects This leads to statistical inference problems from multiple points clouds such as those displayed in Figure 2 that can be modeled as discrete probability measures supported on Rd. Techniques based on optimal transport for data science have recently received an increasing interest in mathematical and computational statistics [8,10,11,12,13,14,15,28,29,44,45,47,49,51,54,58,62,66,67], machine learning [4, 6,22,23,32,36,37,38,40,55,57], image processing and computer vision [7,17,24,30,31,41,52,59,60] or computational biology [56]. Throughout the paper, we discuss some research perspectives and open problems related to the statistical aspects of barycenters and GPCA in the Wasserstein space

The Wasserstein metric
Regularized optimal transport
Wasserstein barycenters
Statistical models of random probability measures
Law of large numbers
Rate of convergence in the one dimensional case
Regularization of Wasserstein barycenters
Penalized Wasserstein barycenters
Sinkhorn barycenters
Data-driven regularization in computational optimal transport
Geometry of the Wasserstein space and geodesic PCA
Geometry of the Wasserstein space
Geodesic PCA in the one-dimensional case
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.