Sequence similarity of pathogen genomes can infer the relatedness between isolates as the fewer genetic differences identified between pairs of isolates, the less time since divergence from a common ancestor. Clustering based on hierarchical single linkage clustering of pairwise SNP distances has been employed to detect and investigate outbreaks. Here, we evaluated the evidence-base for the interpretation of phylogenetic clusters of Shiga toxin-producing Escherichia coli (STEC) O157:H7. Whole genome sequences of 1193 isolates of STEC O157:H7 submitted to Public Health England between July 2015 and December 2016 were mapped to the Sakai reference strain. Hierarchical single linkage clustering was performed on the pairwise SNP difference between all isolates at descending distance thresholds. Cases with known epidemiological links fell within 5-SNP single linkage clusters. Five-SNP single linkage community clusters where an epidemiological link was not identified were more likely to be temporally and/or geographically related than sporadic cases. Ten-SNP single linkage clusters occurred infrequently and were challenging to investigate as cases were few, and temporally and/or geographically dispersed. A single linkage cluster threshold of 5-SNPs has utility for the detection of outbreaks linked to both persistent and point sources. Deeper phylogenetic analysis revealed that the distinction between domestic UK and imported isolates could be inferred at the sub-lineage level. Cases associated with domestically acquired infection that fall within clusters that are predominantly travel associated are likely to be caused by contaminated imported food.
Read full abstract