Abstract

Epidemiological studies of communicable diseases increasingly use large whole-genome sequencing (WGS) datasets to explore the transmission of pathogens. It is important to obtain an initial overview of datasets and identify closely related isolates, but this can be challenging with large numbers of isolates and imperfect sequencing. We used an ad hoc whole-genome multi locus sequence typing method to summarise data from a longitudinal study of Staphylococcus aureus in a primary school in New Zealand. Each pair of isolates was compared and the number of genes where alleles differed between isolates was tallied to produce a matrix of “allelic differences”. We plotted histograms of the number of allelic differences between isolates for: all isolate pairs; pairs of isolates from different individuals; and pairs of isolates from the same individual. 340 sequenced isolates were included, and the ad hoc shared genome contained 445 genes. There were between 0 and 420 allelic differences between isolate pairs and the majority of pairs had more than 260 allelic differences. We found many genetically closely related S. aureus isolates from single individuals and a smaller number of closely-related isolates from separate individuals. Multiple S. aureus isolates from the same individual were usually very closely related or identical over the ad hoc shared genome. Siblings carried genetically similar, but not identical isolates. An ad hoc shared genome approach to WGS analysis can accommodate imperfect sequencing of the included isolates, and can provide insights into relationships between isolates in epidemiological studies with large WGS datasets containing diverse isolates.

Highlights

  • Epidemiological studies of communicable diseases increasingly use large whole-genome sequencing (WGS) datasets to explore the transmission of pathogens

  • Comparisons of isolates based on accessory genomes, single nucleotide polymorphisms (SNPs), recombinations and deletions in such large datasets lack the means to quantify differences between isolates in a way that can be applied across the whole dataset

  • We used an ad hoc shared genome method to compare S. aureus isolates obtained from a longitudinal study in a primary school, in order to obtain an overview of the dataset and identify closely related isolates

Read more

Summary

Introduction

Epidemiological studies of communicable diseases increasingly use large whole-genome sequencing (WGS) datasets to explore the transmission of pathogens. “Ad hoc shared genome” approaches have the potential to overcome the exclusion of large numbers of sequenced isolates, such as in analyses using predefined loci, by comparing only genes that are both present and have a high certainty over nucleotide sequences in all isolates (as genes might be falsely absent due to incorrect assembly or incomplete coverage of the gene)[5] Such ad hoc approaches provide better resolution than 7-locus ­MLST6,7. We used an ad hoc shared genome method to compare S. aureus isolates obtained from a longitudinal study in a primary school, in order to obtain an overview of the dataset and identify closely related isolates

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call