Given a labeled tree topology t of n taxa, consider a population P of k leaves chosen among those of t. The clade of P is the minimal subtree P^ of t containing P, and its size |P^| is provided by the number of leaves in the clade. We study distributive properties of the clade size variable |P^| considered over labeled topologies of size n generated at random in the framework of Ford’s α-model. Under this model, starting from the one-taxon labeled topology, a random labeled topology is produced iteratively by a sequence of α-insertions, each of which adds a pendant edge to either a pendant or internal edge of a labeled topology, with a probability that depends on the parameter α∈[0,1]. Different values of α determine different probability distributions over the set of labeled topologies of given size n, with the special cases α=0 and α=1/2 respectively corresponding to the Yule and uniform distributions. In the first part of the manuscript, we consider a labeled topology t of size n generated by a sequence of random α-insertions starting from a fixed labeled topology t∗ of given size k, and determine the probability mass function, mean, and variance of the clade size |P^| in t when P is chosen as the set of leaves of t inherited from t∗. In the second part of the paper, we calculate the probability that a set P of k leaves chosen at random in a Ford-distributed labeled topology of size n is monophyletic, that is, the probability that |P^|=k. Our investigations extend previous results on clade size statistics obtained for Yule and uniformly distributed labeled topologies.
Read full abstract