Abstract

BackgroundConventional phylogenetic clustering approaches rely on arbitrary cutpoints applied a posteriori to phylogenetic estimates. Although in practice, Bayesian and bootstrap-based clustering tend to lead to similar estimates, they often produce conflicting measures of confidence in clusters. The current study proposes a new Bayesian phylogenetic clustering algorithm, which we refer to as DM-PhyClus (Dirichlet-Multinomial Phylogenetic Clustering), that identifies sets of sequences resulting from quick transmission chains, thus yielding easily-interpretable clusters, without using any ad hoc distance or confidence requirement.ResultsSimulations reveal that DM-PhyClus can outperform conventional clustering methods, as well as the Gap procedure, a pure distance-based algorithm, in terms of mean cluster recovery. We apply DM-PhyClus to a sample of real HIV-1 sequences, producing a set of clusters whose inference is in line with the conclusions of a previous thorough analysis.ConclusionsDM-PhyClus, by eliminating the need for cutpoints and producing sensible inference for cluster configurations, can facilitate transmission cluster detection. Future efforts to reduce incidence of infectious diseases, like HIV-1, will need reliable estimates of transmission clusters. It follows that algorithms like DM-PhyClus could serve to better inform public health strategies.

Highlights

  • Conventional phylogenetic clustering approaches rely on arbitrary cutpoints applied a posteriori to phylogenetic estimates

  • Phylogenetic models represent the ancestral relationships between sequences of nucleotides or amino acids with a hierarchical tree structure known as a phylogeny

  • Phylogenetics can help guide public health efforts to curb incidence of HIV-1 and tuberculosis [3,4,5], by revealing the existence of transmission clusters, epidemiologically-linked individuals infected by a genetically-similar pathogen

Read more

Summary

Introduction

Conventional phylogenetic clustering approaches rely on arbitrary cutpoints applied a posteriori to phylogenetic estimates. The current study proposes a new Bayesian phylogenetic clustering algorithm, which we refer to as DM-PhyClus (Dirichlet-Multinomial Phylogenetic Clustering), that identifies sets of sequences resulting from quick transmission chains, yielding -interpretable clusters, without using any ad hoc distance or confidence requirement. The collection and, often public, availability of viral genotyping data has made phylogenetics, the field concerned with the inference from genetic data of the ancestral history of organisms, a popular tool for modelling epidemics [1, 2]. Phylogenetic models represent the ancestral relationships between sequences of nucleotides or amino acids with a hierarchical tree structure known as a phylogeny. To estimate transmission clusters from an inferred phylogeny, a collection of ad hoc rules are conventionally applied. A clade corresponds to a cluster only when it is known with high confidence, and when its sequences are similar. Disagreements over clustering rules are common, and what the resulting partitions mean in an epidemiological sense is still unclear [8, 9]

Objectives
Results
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call