Abstract

Phylogenetic Trees are critical in human genome research for investigating human evolution and identifying disease-associated genetic markers. New high-throughput genome sequencing technologies raise an urgent need to develop statistical methods that can construct phylogenetic trees from long genome sequences with quick computation speeds, while considering various biological complexities. Though an ancestral mixture model has been proposed [Chen SC, Lindsay BG. Building mixture trees from binary sequence data. Biometrika. 2006;93(4):843–860. doi: 10.1093/biomet/93.4.843] to this end by allowing genetic mutations over generations, another essential evolution factor, genetic recombination, is missed. Therefore, in this paper, we develop a novel genetic recombination model for tree construction and propose to use Markov chain composite likelihood (MCCL) to make model estimation computationally feasible. To further reduce computation complexity, a hierarchical estimator is constructed to estimate unknown ancestral distributions through MCCL. Simulation studies and real data example show that our proposed methods perform well and fast, so have the potential for implementation in long sequence genome data.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call