Abstract

We consider phylogeny estimation under a two-state model of sequence evolution by site substitution on a tree. In the asymptotic regime where the sequence lengths tend to infinity, we show that for any fixed k no statistically consistent phylogeny estimation is possible from k-mer counts over the full leaf sequences alone. Formally, we establish that the joint distribution of k-mer counts over the entire leaf sequences on two distinct trees have total variation distance bounded away from 1 as the sequence length tends to infinity. Our impossibility result implies that statistical consistency requires more sophisticated use of k-mer count information, such as block techniques developed in previous theoretical work.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.