Curve Registration of Functional Data for Approximate Bayesian Computation

Anthony Ebert,Paul Wu,Fabrizio Ruggeri,Kerrie Mengersen

doi:10.3390/stats4030045

Abstract

Approximate Bayesian computation is a likelihood-free inference method which relies on comparing model realisations to observed data with informative distance measures. We obtain functional data that are not only subject to noise along their y axis but also to a random warping along their x axis, which we refer to as the time axis. Conventional distances on functions, such as the L2 distance, are not informative under these conditions. The Fisher–Rao metric, previously generalised from the space of probability distributions to the space of functions, is an ideal objective function for aligning one function to another by warping the time axis. We assess the usefulness of alignment with the Fisher–Rao metric for approximate Bayesian computation with four examples: two simulation examples, an example about passenger flow at an international airport, and an example of hydrological flow modelling. We find that the Fisher–Rao metric works well as the objective function to minimise for alignment; however, once the functions are aligned, it is not necessarily the most informative distance for inference. This means that likelihood-free inference may require two distances: one for alignment and one for parameter inference.

Highlights

The plots show that the registered d-samplers outperform their unregistered counterparts; the FR is an ideal distance for alignment of functions, when used as a distance for approximate Bayesian computation (ABC), it is sometimes outperformed by maximum mean discrepancy (MMD)
Curve registration techniques are applied to the problem of misaligned functional data for likelihood-free inference for the first time
This means that the distance for alignment is not necessarily the best distance for ABC samplers once alignment is performed

Summary

Introduction

What constitutes the sample space is sometimes a matter of perspective. One perspective is that the data comprise a set of functions rather than numbers [1]. Using ultraviolet-visible spectroscopy, suppose we record the absorbance for a series of samples over a wide range of frequencies. We might regard the sample number and frequency as explanatory variables and absorbance as the response variable; alternatively, for each sample, we could consider the entire functional form of the absorption spectrum as the response variable, in which case the only explanatory variable is the sample number. The sample space could comprise the output range of the machine (a subset of the real line) or the function space of the set of all possible spectra

Methods

Results

Conclusion