Abstract

Current phylogenetic clustering approaches for identifying pathogen transmission clusters are limited by their dependency on arbitrarily defined genetic distance thresholds for within-cluster divergence. Incomplete knowledge of a pathogen’s underlying dynamics often reduces the choice of distance threshold to an exploratory, ad hoc exercise that is difficult to standardise across studies. Phydelity is a new tool for the identification of transmission clusters in pathogen phylogenies. It identifies groups of sequences that are more closely related than the ensemble distribution of the phylogeny under a statistically principled and phylogeny-informed framework, without the introduction of arbitrary distance thresholds. Relative to other distance threshold- and model-based methods, Phydelity outputs clusters with higher purity and lower probability of misclassification in simulated phylogenies. Applying Phydelity to empirical datasets of hepatitis B and C virus infections showed that Phydelity identified clusters with better correspondence to individuals that are more likely to be linked by transmission events relative to other widely used non-parametric phylogenetic clustering methods without the need for parameter calibration. Phydelity is generalisable to any pathogen and can be used to identify putative direct transmission events. Phydelity is freely available at https://github.com/alvinxhan/Phydelity.

Highlights

  • Recent advances in high-throughput sequencing technologies have led to the widespread use of sequence data in infectious disease epidemiology (Gardy and Loman 2017)

  • Requiring only the phylogenetic tree as input, Phydelity infers putative transmission clusters through the identification of groups of sequences that are more closely-related to one another than the ensemble distribution under a statistically-principled framework. Like another phylogenetic clustering tool that we recently developed, PhyCLIP, is based on integer linear programming (ILP) optimisation (Han et al 2019)

  • Phydelity was applied to simulated HIV epidemics among men who have sex with men (MSM) belonging to a hypothetical sexual contact network structures where transmission clusters were attributed to transmission by sexual contact among individuals belonging to the same subnetwork

Read more

Summary

Introduction

Recent advances in high-throughput sequencing technologies have led to the widespread use of sequence data in infectious disease epidemiology (Gardy and Loman 2017). Requiring only the phylogenetic tree as input, Phydelity infers putative transmission clusters through the identification of groups of sequences that are more closely-related to one another than the ensemble distribution under a statistically-principled framework. Phydelity, like another phylogenetic clustering tool that we recently developed, PhyCLIP, is based on integer linear programming (ILP) optimisation (Han et al 2019). PhyCLIP uses the divergence information of the entire phylogenetic tree to inclusively assign statistically-supported cluster membership to as many sequences in the tree as possible that putatively capture variant ecological, evolutionary or epidemiological processes. While PhyCLIP’s designated clusters are underpowered to be interpreted as sequences linked by transmission events, clusters inferred by Phydelity can be interpreted as putative transmission clusters

Method a b
Results
Discussion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.