Abstract

Enterohemorrhagic Escherichia coli continues to be a significant public health risk. With the onset of next generation sequencing, whole genome sequences require a new paradigm of analysis relevant for epidemiology and drug discovery. A large-scale bacterial population genomic analysis was applied to 702 isolates of serotypes associated with EHEC resulting in five pangenome clusters. Serotype incongruence with pangenome types suggests recombination clusters. Core genome analysis was performed to determine the population wide distribution of sdiA as potential drug target. Protein modelling revealed nonsynonymous variants are notably absent in the ligand binding site for quorum sensing, indicating that population wide conservation of the sdiA ligand site can be targeted for potential prophylactic purposes. Applying pathotype-wide pangenomics as a guide for determining evolution of pharmacophore sites is a potential approach in drug discovery.

Highlights

  • Enterohemorrhagic Escherichia coli continues to be a significant public health risk

  • Any further responses from the reviewers can be found at the end of the article Introduction One of the more prominent strains of Escherichia coli is the enterohemorrhagic E. coli (EHEC) pathotype associated with global outbreaks of bloody diarrhea and hemolytic uremic syndrome (HUS), usually by consumption of undercooked beef[1]

  • Pangenome based clustering which integrated core and accessory elements was applied on 702 whole genomes sequences from serotypes associated with EHEC from diverse sources in the environment, as well as animal and human hosts, in order to capture the evolutionary space

Read more

Summary

Methods

EHEC population EHEC associated serotypes are defined based on a previous study[7]. Whole genome sequences with the associated EHEC metadata was downloaded from Enterobase 1.1.2 using the keyword search of the respective serotypes within the E. coli species[8]. This search yielded 702 genomes from environmental, animal and clinical samples. Gff files were extracted as input for the pangenome pipeline Roary 3.11.2 using the following parameters for not splitting paralogs (roary -s -p 32 *.gff) and the resulting presence absence matrix together with the accessory genome phylogeny visualized in Phandango 1.3.0 and is represented as Figure 1B11. The pangenome enables clustering of isolates using gene presence and absence

Results and discussion
Conclusion
Sperandio V
10. Seemann T
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call