Abstract

Tables of estimated abundances of taxa present in metagenomic samples (including human) using Kraken (Wood & Salzberg, Genome Biology, 2013), based on k-mer identity with a reference database extracted from a large set of complete genomes. Kraken assigns reads to all taxonomic levels in a cumulative manner, and relative abundance of taxa can be computed using the ratio of read counts at one specific level over the total. Read counts were computed 1) with a conservative filter on read confidence scores, i.e. keeping only reads with more than 20% k-mers assigned to congruent taxa (using kraken-filter executable with option “--thresh 0.20”); and 2) in a sensitive mode, i.e. without confidence score filtering. Relative abundances were computed at the species and genus level. Distribution of relative abundances per sample (from sensitive mode) showed significant bias relative to sequencing depth for values under 10-12, with low-depth samples being depleted in rare species (Fig. S1), so the species relative abundance dataset was truncated to species with values above 10-12, decreasing the number of represented species from 8,226 to 5,323. We used linear discriminant analysis (LDA) effect size (LEfSe) (Segata et al., 2011) to detect taxa that significantly differentiate groups of samples based on their subsistence strategy (accounting for the underlying grouping by population). We then used a simple LDA, as implemented in the ade4 R package (Dufour & Dray, 2007), to identify the species that specifically differentiate microbiomes along the human lifestyle gradient opposing HGs to Western controls (WCs); significance was assessed with pairwise t-tests, Wilcoxon rank-sum tests (using Benjamini-Hochberg false discovery rate [FDR] correction procedure for multiple testing) and ANCOM test (with low stringency multiple testing correction, option ‘multcorr=2’). Figures : Taxa discriminating between subsistence strategies under conservative and sensitive settings, respectively. WGS-based estimates of taxonomic abundance (Kraken classification from reads with assignment confidence over 20% or unfiltered reads, respectively) were used to find (A) the best discriminant taxa based on a three-way comparison of the HG, TF and WC groups with the LEfSe algorithm (score over 3), and (B) the best discriminant species between HG and WC groups based on a simple LDA. Species are ranked (left to right and top to bottom) by decreasing absolute LDA score. Abundances significantly different under a Wilcoxon rank sum test with FDR-corrected p-values < 0.05 are indicated with an asterisk. WC: Western Controls; TF: Traditional Farmers; HG: Hunter-Gatherers.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call