Abstract

Tree structures, showing hierarchical relationships and the latent structures between samples, are ubiquitous in genomic and biomedical sciences. A common question in many studies is whether there is an association between a response variable measured on each sample and the latent group structure represented by some given tree. Currently, this is addressed on an ad hoc basis, usually requiring the user to decide on an appropriate number of clusters to prune out of the tree to be tested against the response variable. Here, we present a statistical method with statistical guarantees that tests for association between the response variable and a fixed tree structure across all levels of the tree hierarchy with high power while accounting for the overall false positive error rate. This enhances the robustness and reproducibility of such findings.

Highlights

  • Response of interest against the given tree structure

  • If there is no association between the tree T and the response measurement yi, the observed responses yi would be randomly distributed on the leaves of the tree, independent of the tree structure T

  • If the distribution of responses is associated with the tree structure, we may observe clades in the tree with distinct response distributions

Read more

Summary

Introduction

Response of interest against the given tree structure. Every tree T , independent of how it was generated, induces some latent ordering of the samples. treeSeg tests whether, for this particular ordering, the distribution of the independent observations yi depends on their locations on the tree. Users typically decide on the number of clusters on an ad hoc basis (e.g., after plotting the response measurement on the leaves of the tree and deciding visually which clusters to choose), which are tested for association with the outcome of interest. This lack of rigorous statistical methodology has limited the translational application and reproducibility of these methods. This is achieved by embedding the tree segmentation problem into a change-point detection setting [8,9,10,11,12,13]

Methods
Results
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call