Identifiability and inference of phylogenetic birth–death models

Brandon Legried,Jonathan Terhorst

doi:10.1016/j.jtbi.2023.111520

Abstract

Recent theoretical work on phylogenetic birth–death models offers differing viewpoints on whether they can be estimated using lineage-through-time data. Louca and Pennell (2020) showed that the class of models with continuously differentiable rate functions is nonidentifiable: any such model is consistent with an infinite collection of alternative models, which are statistically indistinguishable regardless of how much data are collected. Legried and Terhorst (2022) qualified this grave result by showing that identifiability is restored if only piecewise constant rate functions are considered.Here, we contribute new theoretical results to this discussion, in both the positive and negative directions. Our main result is to prove that models based on piecewise polynomial rate functions of any order and with any (finite) number of pieces are statistically identifiable. In particular, this implies that spline-based models with an arbitrary number of knots are identifiable. The proof is simple and self-contained, relying mainly on basic algebra. We complement this positive result with a negative one, which shows that even when identifiability holds, rate function estimation is still a difficult problem. To illustrate this, we prove some rates-of-convergence results for hypothesis testing using birth–death models. These results are information-theoretic lower bounds which apply to all potential estimators.

Full Text