Abstract

In this paper we investigate the geometry of a discrete Bayesian network whose graph is a tree all of whose variables are binary and the only observed variables are those labeling its leaves. We provide the full geometric description of these models which is given by a set of polynomial equations together with a set of complementary implied inequalities induced by the positivity of probabilities on hidden variables. The phylogenetic invariants given by the equations can be useful in the construction of simple diagnostic tests. However, in this paper we point out the importance of also incorporating the associated inequalities into any statistical analysis. The full characterization of these inequality constraints derived in this paper helps us determine how and why routine statistical methods can break down for this model class.

Highlights

  • A Bayesian network whose graph is a tree all of whose inner nodes represent variables which are not directly observed defines an important class of models containing both phylogenetic tree models and hidden Markov models

  • In [40] we established a useful new coordinate system to analyze such models when all of the variables are binary. This reparametrization enabled us to address various identifiability issues and helped us to derive exact formulae for the maximum likelihood estimators given that the sample proportions were in this model class

  • In this paper we provide the full semialgebraic description of MκT, that is the complete set of polynomial equations and inequalities involving the tree cumulants which describes MκT as the subset of KT, for subsequent use in a statistical analysis of the model class

Read more

Summary

Introduction

A Bayesian network whose graph is a tree all of whose inner nodes represent variables which are not directly observed defines an important class of models containing both phylogenetic tree models and hidden Markov models. The additional inequalities obtained as the main result of this paper complete this description Where and how these inequality constraints can helpfully supplement an analysis based on phylogenetic invariants is illustrated by the simple example given below. The new coordinate system for tree models that we introduced in [40] enables us to explore in detail this relationship between probabilistic tree models ( called the tree decomposable distributions in [27]) and tree metrics and extend these results It has been known for some time that the constraints on possible distances between any two leaves in the tree imply some additional inequality constraints on the possible covariances between the binary variables represented by the leaves.

Tree models and tree cumulants
Inferential issues related to the semialgebraic description
Explicit expression of implied inequality constraints
Example
Discussion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.