Optimization of DADA2 in QIIME2 for improving fidelity in 16S rRNA V4 amplicon data analysis

  • Abstract
  • Literature Map
  • Similar Papers
Abstract
Translate article icon Translate Article Star icon

High-throughput sequencing generates vast data, often containing low-quality bases, chimeras, and artifacts that can mislead taxonomic classification and diversity assessments. Divisive amplicon denoising algorithm 2 (DADA2) enhances taxonomic resolution by excluding low-quality bases and optimizing amplicon sequence variant inference. Proper truncation reduces computational load while maintaining key hypervariable regions for accurate classification. In this study, we examine the effect of various truncation lengths during the DADA2 analysis in ensuring statistical robustness and improving the reliability of microbial community profiling in ecological and environmental studies. Truncation of read length from 175 to 185 bp improves the quality read recovery rate, and preserves microbial diversity in the V4 hypervariable region of the Illumina paired-end reads. Incorporating the optimal truncation length strategy optimizes read recovery and preserves the richness and evenness of microbial communities.

Similar Papers
  • Research Article
  • 10.1023/a:1026693512265
Diversity of nucleotide sequences in hypervariable region 1 of hepatitis C virus in Japanese patients with chronic hepatitis C of unknown mode of transmission.
  • Jan 1, 1999
  • Digestive diseases and sciences
  • Hidenori Toyoda

We evaluated the sequence diversity of the hypervariable region 1 (HVR1) of hepatitis C virus (HCV) in HCV-infected patients in whom the mode of transmission is unknown. The sequence diversity of HVR1 in 26 Japanese patients with chronic HCV infection of unknown mode of transmission (UT) was compared with 17 patients with chronic posttransfusion hepatitis C in whom only a single HCV infection had occurred (PH), and with 18 patients with hemophilia with chronic HCV infection who might have been multiply infected with HCV (HE). The diversity of HVR1 was evaluated by direct sequencing after PCR amplification of HVR1. The sequence diversity of HVR1 was 10.1 +/- 7.7% in UT, 2.7 +/- 2.8% in PH, and 14.6 +/- 6.9% in HE. The diversity of the patients with unknown transmission was greater than that of the posttransfusion hepatitis patients, and in some patients it was similar to that of multitransfused hemophiliac patients (UT vs PH, P = 0.0004; UT vs HE, P = 0.04; and HE vs PH, P < 0.0001). Multiple infections with HCV could have occurred frequently in patients with chronic HCV infection in whom the mode of transmission was unknown, which increased the sequence diversity of HVR1 of these patients.

  • Research Article
  • Cite Count Icon 386
  • 10.1093/mollus/eym029
Review of the systematics and global diversity of freshwater mussel species (Bivalvia: Unionoida)
  • Nov 1, 2007
  • Journal of Molluscan Studies
  • Daniel L Graf + 1 more

Freshwater mussels (Bivalvia: Unionoida) are interesting because of their unique life cycles, global aggregate distribution and ancient origin. They are also of practical importance due to their worldwide, imperiled status. Of utmost utility for their continued study are a modern assessment of global and regional species diversity and a natural classification that reflects phylogenetic patterns. The freshwater malacological community has taken steps toward satisfying the latter of these requirements, but a consensus census of mussel species has not been published since Fritz Haas’s revisions of the late 1960s. We set out to describe the species-level diversity of the Unionoida by reviewing the secondary literature and developing a comprehensive taxonomic database. Each valid species was assigned to one or more geographical regions (i.e. Nearctica, Neotropica, Afrotropica, Palearctica, Indotropica and Australasia) and one or more subregions, and each valid genus was assigned to the lowest possible level in a classification derived from our own, recent phylogenetic analyses. Based upon a consensus of numerous regional works, our global estimate of freshwater mussel diversity is 840 species. Regional diversity was determined as follows: Nearctica: 302 spp., Neotropica: 172, Afrotropica: 85, Palearctica: 45, Indotropica: 219 and Australasia: 33. The largest family is the Unionidae, with 674 species. However, the classification of that taxon is currently in flux, and many genera (corresponding to 225 spp.) were assigned to incertae sedis geographical assemblages. Diversity patterns are discussed, and it is suggested that reevaluation of these faunas with modern methods will likely increase recognized species diversity, especially on the southern continents. Our checklist and classification of freshwater mussel species is included as an appendix and mirrored on the MUSSEL Project Web Site (http:// www.mussel-project.net/).

  • Research Article
  • 10.1017/qua.2024.55
Taxonomic and functional diversity of North American vegetation during the last interglacial–glacial cycle
  • Feb 26, 2025
  • Quaternary Research
  • Timothy Terlizzi + 1 more

We synthesized pre-last glacial maximum pollen records to reconstruct North American pollen diversity since ca. 130 ka. Using taxonomic diversity (a measure of the number and abundance of taxa) and functional diversity (a measure of the number and abundance of different phenotypes) we identified temporal and spatial diversity trends for six North American bioregions: Arctic, Intermountain West, Mexico, Pacific Northwest, Southeast, and Yucatán. Reconstructed taxonomic temporal and spatial trends vary among bioregions, with regional diversity patterns captured in the functional metric, suggesting shifts in species composition coincide with shifts in ecosystem function. However, significant shifts in taxonomic pollen diversity differed in frequency, magnitude, and timing from their functional counterparts. Variations in both regional taxonomic and functional diversity response to global and regional temperature trends were evident, suggesting temperature alone does not fully explain changes in species composition. Regional richness estimates exhibited higher stability relative to the weighted diversity estimates indicating low levels of species turnover through Late Quaternary warming–cooling phases. Shifts in regional diversity did not predictably respond to stadial and interstadial transitions. Instead, North American patterns of plant diversity over the last ca. 130 ka differ geographically, likely responding to regional rather than global climate change.

  • Research Article
  • Cite Count Icon 99
  • 10.1007/s10641-009-9497-0
Genetic diversity in the mtDNA control region and population structure in the small yellow croaker Larimichthys polyactis
  • Jun 24, 2009
  • Environmental Biology of Fishes
  • Yongshuang Xiao + 5 more

The genetic diversity and population genetic structure of the small yellow croaker (Larimichthys polyactis) were investigated. One hundred and fourteen individuals were sampled from 8 localities of the Yellow Sea and the northern East China Sea. Genetic variation in DNA sequences were examined from the first hypervariable region (HVR-1) of the mitochondrial DNA control region. High levels of haplotype diversity (h = 0.98 ± 0.87%) in the HVR-1 region were detected, indicating a high level of genetic diverstiy. A total of 84 polymorphic sites were found, and 87 haplotypes were defined. The pairwise nucleotide differences between samples ranged from 3.83 ± 2.19 to 6.56 ± 3.25. The demographic history of L. polyactis was examined by using neutrality tests and mismatch distribution analysis, which indicated a Pleistocene population expansion at about 49,300–197,000 years. The star burst structure of the minimum spanning tree also suggestted a very recent origin for most haplotypes. Hierarchical molecular variance analysis (AMOVA) and conventional population Fst comparisons revealed no significant genetic structure throughout the examined range, which is inconsistent with previous findings based on the morphological and ecological studies. Long-term dispersal and high gene flow likely have contributed to the genetically homogeneous population structure of the species. The knowledge on genetic diversity and genetic structure will be crucial to establish appropriate fishery management stocks for the species.

  • Research Article
  • Cite Count Icon 3
  • 10.1094/pbiomes-06-22-0037-a
Fungal Metabarcoding Data for Two Grapevine Varieties (Regent and Vitis vinifera ‘Cabernet-Sauvignon’) Inoculated with Powdery Mildew (Erysiphe necator) Under Drought Conditions
  • Nov 15, 2022
  • Phytobiomes Journal
  • Corinne Vacher + 7 more

International audience

  • Research Article
  • 10.46810/tdfd.1667574
Spatial Patterns of Species Diversity in the Saline Vegetation of Central Anatolia, Türkiye
  • Jun 27, 2025
  • Türk Doğa ve Fen Dergisi
  • Didem Ambarlı

Vegetation on saline soils thrives under extreme conditions. The saline vegetation of Central Anatolia is a key component of the Irano-Anatolian Biodiversity Hotspot, notable for its high habitat and species diversity. However, there has been a lack of quantitative assessments of plant diversity in these areas. To address this gap, this study aims to calculate and compare: 1) local species diversity (alpha diversity) across five vegetation alliances, 2) regional diversity (gamma diversity) for each alliance, and 3) the variation in species diversity within alliances (beta diversity). Data from 101 plots representing five alliances collected from Burdur Lake, Acıgöl, Salt Lake, Seyfe Lake, and Sultansazlığı were compiled from relevant publications. The results showed high species diversity in areas with high variation in salinity or humidity due to ecotone characteristics, at all spatial scales. Notably, diversity was highest in salt steppes (Achilleo wilhelmsii-Artemision santonici) and in slightly-saline summer-dry marshes (Lepidio caespitosi-Limonion iconici and Inulo aucheranae-Elymion salsi). Conversely, diversity was lower in non-saline steppes typical for gypsum soils (Astragalo karamasici-Gypsophilion eriocalysis) and in communities found on hypersaline soils (Salicornion fruticosae). Overall, beta diversity was high, reflecting significant species turnover. These findings numerically support existing literature, which suggests that plant community composition can change drastically over short distances. The results highlight the conservation priority of saline areas with ecotone characteristics.

  • Research Article
  • Cite Count Icon 121
  • 10.1034/j.1600-0587.2002.250504.x
Geographic range, turnover rate and the scaling of species diversity
  • Aug 20, 2002
  • Ecography
  • Héctor T Arita + 1 more

The study of the relative roles of local and regional processes in determining the scaling of species diversity is a very active field in current ecology. The importance of species turnover and the species‐range‐size frequency distributions in determining how local and regional species diversity are linked has been recognised by recent approaches. Here we present a model, based on a system of fully nested sampling quadrats, to analyse species diversity at several scales. Using a recursive procedure that incorporates increasingly smaller scales and a multiplicative formula for relating local and regional diversity, the model allows the simultaneous depiction of alpha, beta and gamma diversity in a single “species‐scale plot”. Species diversity is defined as the number of ranges that are intersected by sampling quadrats of various sizes. The size, shape and location of individual species ranges determine diversity at any scale, but the average point diversity, measured at hypothetical zero‐area localities, is determined solely by the size of individual ranges, regardless of their shape and location. The model predicts that if the species‐area relationship is a power function, then beta diversity must be scale invariant if measured at constant scale increments. Applying the model to the mammal fauna of four Mexican regions with contrasting environmental conditions, we found that: 1) the species‐range‐size frequency distribution at the scale of the Mexican regions differs from the log‐normal pattern reported for the national and continental scales. 2) Beta diversity is not scale‐invariant within each region, implying that the species‐area relationship (SAR) does not follow a power function. 3) There is geographic variation in beta diversity. 4) The scaling of diversity is directly linked to patterns of species turnover rate, and ultimately determined by patterns in the geographic distribution of species. The model shows that regional species diversity and the average distribution range of species are the two basic data necessary to predict patterns in the scaling of species diversity.

  • Research Article
  • Cite Count Icon 34
  • 10.1111/btp.12030
Assessing the Relative Efficiency of Termite Sampling Methods along a Rainfall Gradient in African Savannas
  • Mar 1, 2013
  • Biotropica
  • Andrew B Davies + 3 more

Although termites are ecosystem engineers in tropical and sub‐tropical environments, the study of termite ecology is often constrained by sampling difficulties and a lack of established sampling protocols, particularly for savannas. The efficiency and relevance of different methods along climatic gradients, even within a single biome, is largely unknown. Here, we compare the relative contribution of two commonly used sampling methods, cellulose baits and active searching transects, in quantifying savanna termite diversity along a rainfall gradient in South Africa; sampling was conducted during the wet season across four markedly different savanna types. We also assessed the usefulness of different forms of baiting techniques. The relative efficiency of sampling method varied with annual rainfall. In arid savannas, baiting was as effective as active searching transects at sampling termite diversity and we recommend the use of baiting rather due to it being less labor intensive. In savannas of moderately low to intermediate rainfall, baiting and transects sampled different termite species and so both are deemed necessary for an accurate assessment of termite diversity. In contrast, in wetter savannas transects gave a better assessment of diversity, with cellulose baits not contributing much to diversity assessment. The efficiency of baiting techniques differed across the rainfall gradient, with baits needing to be left in the field for a longer period in more arid savannas. We conclude that habitat type, even within a single biome, will determine the sampling method or methods necessary to quantify termite diversity accurately.

  • Research Article
  • Cite Count Icon 5
  • 10.3390/ijms26168090
Cervicovaginal Microbiome and HPV: A Standardized Approach to 16S/ITS NGS and Microbial Community Profiling for Viral Association
  • Aug 21, 2025
  • International Journal of Molecular Sciences
  • Jane Shen-Gunther + 3 more

16S rRNA next-generation sequencing (NGS) has significantly advanced cervicovaginal microbiome profiling, offering insights into the relationship between vaginal dysbiosis and HPV-associated carcinogenesis. However, reliance on a limited set of 16S hypervariable regions introduces inherent biases that impact results. This study developed standardized workflows for 16S/ITS NGS, with a focus on identifying methodological biases that influence microbial abundance and taxonomic specificity. Commercial NGS tools were employed, including the 16S/ITS QIAseq V1–V9 screening panel, ATCC vaginal microbial standard, and CLC Genomics Workbench integrated with a customized database (VAGIBIOTA) for analysis. The microbial communities of 66 cervical cytology samples were characterized. Among the regions tested, V3V4 exhibited the least quantitative bias, while V1V2 offered the highest specificity. Microbial profiles and Community State Types (CST) (I–V) were broadly consistent with prior studies, with Lactobacillus abundance clustering into three states: L.-dominant (CST I–III, V), L.-diminished (CST IV-A), and L.-depleted (CST IV-B). Differential abundance analysis revealed that anaerobic opportunistic pathogens dominant in CST IV-B (dysbiosis) were also enriched in HSIL and HPV-16 positive samples. Our findings revealed distinct differences in species identification across 16S rRNA hypervariable regions, emphasizing the importance of region selection in clarifying microbial contributions to HPV-associated carcinogenesis.

  • Research Article
  • Cite Count Icon 165
  • 10.1073/pnas.1018426108
Ultra-deep sequencing of foraminiferal microbarcodes unveils hidden richness of early monothalamous lineages in deep-sea sediments
  • Jul 25, 2011
  • Proceedings of the National Academy of Sciences
  • Béatrice Lecroq + 8 more

Deep-sea floors represent one of the largest and most complex ecosystems on Earth but remain essentially unexplored. The vastness and remoteness of this ecosystem make deep-sea sampling difficult, hampering traditional taxonomic observations and diversity assessment. This problem is particularly true in the case of the deep-sea meiofauna, which largely comprises small-sized, fragile, and difficult-to-identify metazoans and protists. Here, we introduce an ultra-deep sequencing-based metagenetic approach to examine the richness of benthic foraminifera, a principal component of deep-sea meiofauna. We used Illumina sequencing technology to assess foraminiferal richness in 31 unsieved deep-sea sediment samples from five distinct oceanic regions. We sequenced an extremely short fragment (36 bases) of the small subunit ribosomal DNA hypervariable region 37f, which has been shown to accurately distinguish foraminiferal species. In total, we obtained 495,978 unique sequences that were grouped into 1,643 operational taxonomic units, of which about half (841) could be reliably assigned to foraminifera. The vast majority of the operational taxonomic units (nearly 90%) were either assigned to early (ancient) lineages of soft-walled, single-chambered (monothalamous) foraminifera or remained undetermined and yet possibly belong to unknown early lineages. Contrasting with the classical view of multichambered taxa dominating foraminiferal assemblages, our work reflects an unexpected diversity of monothalamous lineages that are as yet unknown using conventional micropaleontological observations. Although we can only speculate about their morphology, the immense richness of deep-sea phylotypes revealed by this study suggests that ultra-deep sequencing can improve understanding of deep-sea benthic diversity considered until now as unknowable based on a traditional taxonomic approach.

  • Research Article
  • Cite Count Icon 31
  • 10.1007/s00122-003-1403-0
Assessment of cytochrome P450 sequences offers a useful tool for determining genetic diversity in higher plant species.
  • Sep 13, 2003
  • Theoretical and Applied Genetics
  • S Yamanaka + 5 more

To investigate and develop new genetic tools for assessing genome-wide diversity in higher plant-species, polymorphisms of gene analogues of mammalian cytochrome P450 mono-oxygenases were studied. Data mining on Arabidopsis thaliana indicated that a small number of primer-sets derived from P450 genes could provide universal tools for the assessment of genome-wide genetic diversity in diverse plant species that do not have relevant genetic markers, or for which, there is no prior inheritance knowledge of inheritance traits. Results from PCR amplification of 51 plant species from 28 taxonomic families using P450 gene-primer sets suggested that there were at least several mammalian P450 gene mammalian-analogues in plants. Intra- and inter- specific variations were demonstrated following PCR amplifications of P450 analogue fragments, and this suggested that these would be effective genetic markers for the assessment of genetic diversity in plants. In addition, BLAST search analysis revealed that these amplified fragments possessed homologies to other genes and proteins in different plant varieties. We conclude that the sequence diversity of P450 gene-analogues in different plant species reflects the diversity of functional regions in the plant genome and is therefore an effective tool in functional genomic studies of plants.

  • Research Article
  • Cite Count Icon 36
  • 10.1111/lam.13005
Sex-related differences in the thanatomicrobiome in postmortem heart samples using bacterial gene regions V1-2 and V4.
  • Jun 6, 2018
  • Letters in Applied Microbiology
  • C.R Bell + 3 more

The findings represent preliminary data of the first thanatomicrobiome investigation of a comparison between 16S rRNA gene V1-2 and V4 amplicon signatures in corpse heart tissues. The results demonstrated that V4 hypervariable region amplicons had statistically significant (P<0·05) sex-dependent microbial diversity. For example, Streptococcus sp. was solely found in male postmortem heart tissues. Interestingly, the results also show that V4 amplicons had higher abundance of Clostridium sp. and Pseudomonas sp. in female heart tissues compared to males. The finding of Clostridium sp. supports the postmortem clostridium effect in corpse heart tissues.

  • Research Article
  • Cite Count Icon 4
  • 10.5812/pedinfect.36433
High Diversity of Methicillin-Resistant Staphylococcus aureus (MRSA) Isolates Based on Hypervariable Region Polymorphisms
  • Aug 2, 2016
  • Archives of Pediatric Infectious Diseases
  • Seyed Foad Mirkarimi + 5 more

Background: Methicillin-resistant Staphylococcus aureus (MRSA) is considered one of the most important pathogenic bacteria and most prevalent pathogens causing dangerous infections in humans. Objectives: The purpose of this study was to analyze the hypervariable region (HVR) diversity of clinical MRSA isolates in Tabriz, northwestern Iran. Methods: In this retrospective and descriptive study, from Staphylococcus aureus strains isolated from clinical specimens of hospitalized patients from 2006 to 2013 at Tabriz health centers, 151 isolates were randomly selected. Methicillin-resistant isolates were identified by the agar disk diffusion method and mecA PCR assays. The genetic diversity of the isolates in the HVR were analyzed with the HVR typing method. Results: According to the antibiogram test results, from 151 samples, 52 isolates (34.4%) were resistant to cefoxitin. However, based on the polymerase chain reaction (PCR) assay, 54 isolates (35.8%) had the mecA gene and were identified as MRSA strains. According to PCR of the mecHVR, these MRSA strains were classified into seven different genotypes of HVR groups. Conclusions: High HVR diversity among the studied MRSA isolates could be a result of insufficient or inadequate infection-control protocols in Tabriz hospitals. Moreover, the high number of HVR genotypes showed that HVR typing can be used along with other typing methods in epidemiological studies of MRSA as a useful tool for monitoring, tracking contaminations, and controlling infections in hospital settings.

  • PDF Download Icon
  • Research Article
  • Cite Count Icon 3
  • 10.1016/j.dib.2018.01.011
Data on haplotype diversity in the hypervariable region I, II and III of mtDNA amongst the Brahmin population of Haryana
  • Jan 31, 2018
  • Data in Brief
  • Kapil Verma + 4 more

Human mitochondrial DNA (mtDNA) is routinely analysed for pathogenic mutations, evolutionary studies, estimation of time of divergence within or between species, phylogenetic studies and identification of degraded remains. The data on various regions of human mtDNA has added enormously to the knowledge pool of population genetics as well as forensic genetics. The displacement-loop (D-loop) in the control region of mtDNA is rated as the most rapidly evolving part, due to the presence of variations in this region. The control region consists of three hypervariable regions. These hypervariable regions (HVI, HVII and HVIII) tend to mutate 5–10 times faster than nuclear DNA. The high mutation rate of these hypervariable regions is used in population genetic studies and human identity testing. In the present data, potentially informative hypervariable regions of mitochondrial DNA (mtDNA) i.e. HVI (np 16024–16365), HVII (np 73–340) and HVIII (np 438–576) were estimated to understand the genetic diversity amongst Brahmin population of Haryana. Blood samples had been collected from maternally unrelated individuals from the different districts of Haryana. An array of parameters comprising of polymorphic sites, transitions, transversions, deletions, gene diversity, nucleotide diversity, pairwise differences, Tajima's D test, Fu's Fs test, mismatch observed variance and expected heterozygosity were estimated. The observed polymorphisms with their respective haplogroups in comparison to rCRS were assigned.

  • Research Article
  • Cite Count Icon 86
  • 10.1111/j.1365-2958.1991.tb00789.x
Characterization of the opa (class 5) gene family of Neisseria meningitidis
  • Jun 1, 1991
  • Molecular Microbiology
  • E L Aho + 4 more

Class 5 outer membrane proteins of Neisseria meningitidis show both phase- and antigenic variation of expression. The proteins are encoded by a family of opa genes that share a conserved framework interspersed with three variable regions, designated the semivariable (SV) region and hypervariable regions 1 (HV1) and 2 (HV2). In this study, we determined the number and DNA sequence of all of the opa genes of meningococcal strain FAM18, to assess the structural and antigenic variability in the family of proteins made by one strain. Pulsed field electrophoresis and Southern blotting showed that there are four opa genes in the FAM18 chromosome, and that they are not tightly clustered. DNA sequence analysis of the four cloned genes showed a modest degree of diversity in the SV region and more extensive differences in the HV1 and HV2 regions. There were four versions of HV1 and three versions of HV2 among the four genes. Each of the FAM18 opa loci contained a gene with a unique combination of SV, HV1, and HV2 sequences. We used lambda gt11 cloning and synthetic peptides to demonstrate that HV2 sequences completely encode the epitopes for two monoclonal antibodies specific for different class 5 proteins of FAM18.

Save Icon
Up Arrow
Open/Close