Abstract

In this study we developed a genome-based method for detecting Staphylococcus aureus subtypes from metagenome shotgun sequence data. We used a binomial mixture model and the coverage counts at >100,000 known S. aureus SNP (single nucleotide polymorphism) sites derived from prior comparative genomic analysis to estimate the proportion of 40 subtypes in metagenome samples. We were able to obtain >87% sensitivity and >94% specificity at 0.025X coverage for S. aureus. We found that 321 and 149 metagenome samples from the Human Microbiome Project and metaSUB analysis of the New York City subway, respectively, contained S. aureus at genome coverage >0.025. In both projects, CC8 and CC30 were the most common S. aureus clonal complexes encountered. We found evidence that the subtype composition at different body sites of the same individual were more similar than random sampling and more limited evidence that certain body sites were enriched for particular subtypes. One surprising finding was the apparent high frequency of CC398, a lineage often associated with livestock, in samples from the tongue dorsum. Epidemiologic analysis of the HMP subject population suggested that high BMI (body mass index) and health insurance are possibly associated with S. aureus carriage but there was limited power to identify factors linked to carriage of even the most common subtype. In the NYC subway data, we found a small signal of geographic distance affecting subtype clustering but other unknown factors influence taxonomic distribution of the species around the city.

Highlights

  • Bacterial species are commonly comprised of multiple phylogenetic clades that have distinctive phenotypic properties

  • The binary categorical variables from the metadata, which we investigated in relation to presence of S. aureus and/or a particular S. aureus subtype in a body site were gender, breastfed or not, tobacco use, insurance information and history of previous surgery (Table S3)

  • We developed a SNP matrix based on a training set of 2,692 genetically diverse S. aureus strains downloaded from the Sequence Read Archive database

Read more

Summary

Introduction

Bacterial species are commonly comprised of multiple phylogenetic clades that have distinctive phenotypic properties. The process of identifying which clade a bacterial strain belongs in goes by several names but here we will refer to it as subtyping. Used subtyping methods include multilocus sequence typing (MLST), pulsed-field gel electrophoresis (PFGE), oligotyping and variable-number of tandem-repeat typing (VNTR). How to cite this article Joseph et al (2016), The single-species metagenome: subtyping Staphylococcus aureus core genome sequences from shotgun metagenomic data. Each of these methods was developed for bacteria first isolated in pure culture in the laboratory before DNA extraction. For early disease diagnosis of pathogenic bacterial species and to understand bacteria in the context of their natural community, it would be advantageous to subtype directly from clinical specimens such as blood and sputum. Current direct identification options such as 16S rRNA gene sequence, FISH and REP-PCR, are not able to subtype bacteria below the species level taxonomic resolution, nor to deal with mixtures of subtypes of the same species being present

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call