Estimation of genetic diversity in viral populations from next generation sequencing data with extremely deep coverage

  • Abstract
  • Highlights & Summary
  • Literature Map
  • Similar Papers
Abstract
Translate article icon Translate Article Star icon
Take notes icon Take Notes

BackgroundIn this paper we propose a method and discuss its computational implementation as an integrated tool for the analysis of viral genetic diversity on data generated by high-throughput sequencing. The main motivation for this work is to better understand the genetic diversity of viruses with high rates of nucleotide substitution, as HIV-1 and Influenza. Most methods for viral diversity estimation proposed so far are intended to take benefit of the longer reads produced by some next-generation sequencing platforms in order to estimate a population of haplotypes which represent the diversity of the original population. The method proposed here is custom-made to take advantage of the very low error rate and extremely deep coverage per site, which are the main features of some neglected technologies that have not received much attention due to the short length of its reads, which precludes haplotype estimation. This approach allowed us to avoid some hard problems related to haplotype reconstruction (need of long reads, preliminary error filtering and assembly).ResultsWe propose to measure genetic diversity of a viral population through a family of multinomial probability distributions indexed by the sites of the virus genome, each one representing the distribution of nucleic bases per site. Moreover, the implementation of the method focuses on two main optimization strategies: a read mapping/alignment procedure that aims at the recovery of the maximum possible number of short-reads; the inference of the multinomial parameters in a Bayesian framework with smoothed Dirichlet estimation. The Bayesian approach provides conditional probability distributions for the multinomial parameters allowing one to take into account the prior information of the control experiment and providing a natural way to separate signal from noise, since it automatically furnishes Bayesian confidence intervals and thus avoids the drawbacks of preliminary error filtering.ConclusionsThe methods described in this paper have been implemented as an integrated tool called Tanden (Tool for Analysis of Diversity in Viral Populations) and successfully tested on samples obtained from HIV-1 strain NL4-3 (group M, subtype B) cultivations on primary human cell cultures in many distinct viral propagation conditions. Tanden is written in C# (Microsoft), runs on the Windows operating system, and can be downloaded from: http://tanden.url.ph/.

Similar Papers
  • Research Article
  • Cite Count Icon 31
  • 10.1128/jvi.00561-17
The Number of Target Molecules of the Amplification Step Limits Accuracy and Sensitivity in Ultradeep-Sequencing Viral Population Studies.
  • Jul 27, 2017
  • Journal of Virology
  • Romain Gallet + 3 more

The invention of next-generation sequencing (NGS) techniques marked the coming of a new era in the detection of the genetic diversity of intrahost viral populations. A good understanding of the genetic structure of these populations requires, first, the ability to identify the different isolates or variants and, second, the ability to accurately quantify them. However, the initial amplification step of NGS studies can impose potential quantitative biases, modifying the variant relative frequencies. In particular, the number of target molecules (NTM) used during the amplification step is vastly overlooked although of primary importance, as it sets the limit of the accuracy and sensitivity of the sequencing procedure. In the present article, we investigated quantitative biases in an NGS study of populations of a multipartite single-stranded DNA (ssDNA) virus at different steps of the procedure. We studied 20 independent populations of the ssDNA virus faba bean necrotic stunt virus (FBNSV) in two host plants, Vicia faba and Medicago truncatula FBNSV is a multipartite virus composed of eight genomic segments, whose specific and host-dependent relative frequencies are defined as the "genome formula." Our results show a significant distortion of the FBNSV genome formula after the amplification and sequencing steps. We also quantified the genetic bottleneck occurring at the amplification step by documenting the NTM of two genomic segments of FBNSV. We argue that the NTM must be documented and carefully considered when determining the sensitivity and accuracy of data from NGS studies.IMPORTANCE The advent of next-generation sequencing (NGS) techniques now enables study of the genetic diversity of viral populations. A good understanding of the genetic structure of these populations first requires the ability to identify the different isolates or variants and second requires the ability to accurately quantify them. Prior to sequencing, viral genomes need to be amplified, a step that potentially imposes quantitative biases and modifies the viral population structure. In particular, the number of target molecules (NTM) used during the amplification step is of primary importance, as it sets the limit of the accuracy and sensitivity of the sequencing procedure. In this work, we used 20 replicated populations of the multipartite faba bean necrotic stunt virus (FBNSV) to estimate the various limitations of ultradeep-sequencing studies performed on intrahost viral populations. We report quantitative biases during rolling-circle amplification and the NTM of two genomic segments of FBNSV.

  • PDF Download Icon
  • Research Article
  • Cite Count Icon 70
  • 10.1371/journal.pgen.1005838
Continuous Influx of Genetic Material from Host to Virus Populations
  • Feb 1, 2016
  • PLoS Genetics
  • Clément Gilbert + 5 more

Many genes of large double-stranded DNA viruses have a cellular origin, suggesting that host-to-virus horizontal transfer (HT) of DNA is recurrent. Yet, the frequency of these transfers has never been assessed in viral populations. Here we used ultra-deep DNA sequencing of 21 baculovirus populations extracted from two moth species to show that a large diversity of moth DNA sequences (n = 86) can integrate into viral genomes during the course of a viral infection. The majority of the 86 different moth DNA sequences are transposable elements (TEs, n = 69) belonging to 10 superfamilies of DNA transposons and three superfamilies of retrotransposons. The remaining 17 sequences are moth sequences of unknown nature. In addition to bona fide DNA transposition, we uncover microhomology-mediated recombination as a mechanism explaining integration of moth sequences into viral genomes. Many sequences integrated multiple times at multiple positions along the viral genome. We detected a total of 27,504 insertions of moth sequences in the 21 viral populations and we calculate that on average, 4.8% of viruses harbor at least one moth sequence in these populations. Despite this substantial proportion, no insertion of moth DNA was maintained in any viral population after 10 successive infection cycles. Hence, there is a constant turnover of host DNA inserted into viral genomes each time the virus infects a moth. Finally, we found that at least 21 of the moth TEs integrated into viral genomes underwent repeated horizontal transfers between various insect species, including some lepidopterans susceptible to baculoviruses. Our results identify host DNA influx as a potent source of genetic diversity in viral populations. They also support a role for baculoviruses as vectors of DNA HT between insects, and call for an evaluation of possible gene or TE spread when using viruses as biopesticides or gene delivery vectors.

  • Research Article
  • Cite Count Icon 204
  • 10.1186/1471-2164-13-475
De novo assembly of highly diverse viral populations
  • Jan 1, 2012
  • BMC Genomics
  • Xiao Yang + 9 more

BackgroundExtensive genetic diversity in viral populations within infected hosts and the divergence of variants from existing reference genomes impede the analysis of deep viral sequencing data. A de novo population consensus assembly is valuable both as a single linear representation of the population and as a backbone on which intra-host variants can be accurately mapped. The availability of consensus assemblies and robustly mapped variants are crucial to the genetic study of viral disease progression, transmission dynamics, and viral evolution. Existing de novo assembly techniques fail to robustly assemble ultra-deep sequence data from genetically heterogeneous populations such as viruses into full-length genomes due to the presence of extensive genetic variability, contaminants, and variable sequence coverage.ResultsWe present VICUNA, a de novo assembly algorithm suitable for generating consensus assemblies from genetically heterogeneous populations. We demonstrate its effectiveness on Dengue, Human Immunodeficiency and West Nile viral populations, representing a range of intra-host diversity. Compared to state-of-the-art assemblers designed for haploid or diploid systems, VICUNA recovers full-length consensus and captures insertion/deletion polymorphisms in diverse samples. Final assemblies maintain a high base calling accuracy. VICUNA program is publicly available at: http://www.broadinstitute.org/scientific-community/science/projects/viral-genomics/ viral-genomics-analysis-software.ConclusionsWe developed VICUNA, a publicly available software tool, that enables consensus assembly of ultra-deep sequence derived from diverse viral populations. While VICUNA was developed for the analysis of viral populations, its application to other heterogeneous sequence data sets such as metagenomic or tumor cell population samples may prove beneficial in these fields of research.

  • PDF Download Icon
  • Research Article
  • Cite Count Icon 13
  • 10.3390/v11070666
Reverse Genetics of RNA Viruses: ISA-Based Approach to Control Viral Population Diversity without Modifying Virus Phenotype
  • Jul 20, 2019
  • Viruses
  • Jean-Sélim Driouich + 3 more

Reverse genetic systems are essential for the study of RNA viruses. Infectious clones remain the most widely used systems to manipulate viral genomes. Recently, a new PCR-based method called ISA (infectious subgenomic amplicons) has been developed. This approach has resulted in greater genetic diversity of the viral populations than that observed using infectious clone technology. However, for some studies, generation of clonal viral populations is necessary. In this study, we used the tick-borne encephalitis virus as model to demonstrate that utilization of a very high-fidelity, DNA-dependent DNA polymerase during the PCR step of the ISA procedure gives the possibility to reduce the genetic diversity of viral populations. We also concluded that the fidelity of the polymerase is not the only factor influencing this diversity. Studying the impact of genotype modification on virus phenotype is a crucial step for the development of reverse genetic methods. Here, we also demonstrated that the utilization of different PCR polymerases did not affect the phenotype (replicative fitness in cellulo and virulence in vivo) compared to the initial ISA procedure and the use of an infectious clone. In conclusion, we provide here an approach to control the genetic diversity of RNA viruses without modifying their phenotype.

  • Research Article
  • Cite Count Icon 609
  • 10.1146/annurev.phyto.39.1.157
Variability and genetic structure of plant virus populations.
  • Sep 1, 2001
  • Annual Review of Phytopathology
  • Fernando García-Arenal + 2 more

Populations of plant viruses, like all other living beings, are genetically heterogeneous, a property long recognized in plant virology. Only recently have the processes resulting in genetic variation and diversity in virus populations and genetic structure been analyzed quantitatively. The subject of this review is the analysis of genetic variation, its quantification in plant virus populations, and what factors and processes determine the genetic structure of these populations and its temporal change. The high potential for genetic variation in plant viruses, through either mutation or genetic exchange by recombination or reassortment of genomic segments, need not necessarily result in high diversity of virus populations. Selection by factors such as the interaction of the virus with host plants and vectors and random genetic drift may in fact reduce genetic diversity in populations. There is evidence that negative selection results in virus-encoded proteins being not more variable than those of their hosts and vectors. Evidence suggests that small population diversity, and genetic stability, is the rule. Populations of plant viruses often consist of a few genetic variants and many infrequent variants. Their distribution may provide evidence of a population that is undifferentiated, differentiated by factors such as location, host plant, or time, or that fluctuates randomly in composition, depending on the virus.

  • PDF Download Icon
  • Research Article
  • Cite Count Icon 7
  • 10.3390/pathogens11121470
Genetic Diversity of Viral Populations Associated with Ananas Germplasm and Improvement of Virus Diagnostic Protocols.
  • Dec 5, 2022
  • Pathogens
  • Adriana E Larrea-Sarmiento + 9 more

Pineapple (Ananas comosus L. [Merr.]) accessions from the U.S. Tropical Plant Genetic Resources and Disease Research (TPGRDR) in Hilo, Hawaii were subjected to RNA-sequencing to study the occurrence of viral populations associated with this vegetatively propagated crop. Analysis of high-throughput sequencing data obtained from 24 germplasm accessions and public domain transcriptome shotgun assembly (TSA) data identified two novel sadwaviruses, putatively named "pineapple secovirus C" (PSV-C) and "pineapple secovirus D" (PSV-D). They shared low amino acid sequence identity (from 34.8 to 41.3%) compared with their homologs in the Pro-pol region of the previously reported PSV-A and PSV-B. The complete genome (7485 bp) corresponding to a previously reported partial sequence of the badnavirus, pineapple bacilliform ER virus (PBERV), was retrieved from one of the datasets. Overall, we discovered a total of 69 viral sequences representing ten members within the Ampelovirus, Sadwavirus, and Badnavirus genera. Genetic diversity and recombination events were found in members of the pineapple mealybug wilt-associated virus (PMWaV) complex as well as PSVs. PMWaV-1, -3, and -6 presented recombination events across the quintuple gene block, while no recombination events were found for PMWaV-2. High recombination frequency of the RNA1 and RNA2 molecules from PSV-A and PSV-B were congruent with the diversity found by phylogenetic analyses. Here, we also report the development and improvement of RT-PCR diagnostic protocols for the specific identification and detection of viruses infecting pineapple based on the diverse viral populations characterized in this study. Given the high occurrence of recombination events, diversity, and discovery of viruses found in Ananas germplasm, the reported and validated RT-PCR assays represent an important advance for surveillance of viral infections of pineapple.

  • Research Article
  • Cite Count Icon 13
  • 10.1093/ve/veac039
Genetic diversity and connectivity of the Ostreid herpesvirus 1 populations in France: A first attempt to phylogeographic inference for a marine mollusc disease
  • Apr 23, 2022
  • Virus Evolution
  • Jean Delmotte + 10 more

The genetic diversity of viral populations is a key driver of the spatial and temporal diffusion of viruses; yet, studying the diversity of whole genomes from natural populations still remains a challenge. Phylodynamic approaches are commonly used for RNA viruses harboring small genomes but have only rarely been applied to DNA viruses with larger genomes. Here, we used the Pacific oyster mortality syndrome (a disease that affects oyster farms around the world) as a model to study the genetic diversity of its causative agent, the Ostreid herpesvirus 1 (OsHV-1) in the three main French oyster-farming areas. Using ultra-deep sequencing on individual moribund oysters and an innovative combination of bioinformatics tools, we de novo assembled twenty-one OsHV-1 new genomes. Combining quantification of major and minor genetic variations, phylogenetic analysis, and ancestral state reconstruction of discrete traits approaches, we assessed the connectivity of OsHV-1 viral populations between the three oyster-farming areas. Our results suggest that the Marennes-Oléron Bay represents the main source of OsHV-1 diversity, from where the virus has dispersed to other farming areas, a scenario consistent with current practices of oyster transfers in France. We demonstrate that phylodynamic approaches can be applied to aquatic DNA viruses to determine how epidemiological, immunological, and evolutionary processes act and potentially interact to shape their diversity patterns.

  • Research Article
  • 10.1093/ve/veaf082
Changes in intra-host mycovirus population diversity after vertical and horizontal transmission
  • Jan 9, 2025
  • Virus Evolution
  • Karla Peranić + 7 more

The remarkable speed at which viral populations mutate allows them to evolve quickly, so that the viral diversity can change, especially when the virus is transmitted, i.e. its population goes through a bottleneck. Our experiments assessed the diversity of the intra-host populations of a mycovirus Cryphonectria hypovirus 1 (CHV1), a natural biocontrol agent of chestnut blight disease, using PacBio long-read HiFi sequencing. The intra-host viral population diversity before and after either vertical or horizontal transmission was estimated using two metrics—nucleotide (mutational) diversity measured as π, and viral variant diversity measured as Nei’s H. A significant bottleneck effect, demonstrated by the decline of the mutational diversity (π), was observed after vertical transmission of prototypical viral populations into conidia, in both investigated viral subtypes, French 1 (F1) and Italian (I). In contrast, the number of viral variants was significantly reduced after the vertical transmission of subtype I but increased for the subtype F1. In newly isolated fungal strains infected with CHV1 subtype I, fewer viral variants were vertically transferred into conidia, relative to the prototypical laboratory isolates, i.e. the average number of transmitted viral variants was smaller. In the horizontal viral transmission assays, the number of transmitted viral variants was closely linked to the genotype of the fungal host at the vegetative compatibility loci. Specifically, recipient viral populations’ diversity was greater when the alleles at loci vic2 and vic3 were the same in the donor and recipient fungal isolate, relative to when they were different. Heteroallelism at the vic4 locus had no impact on viral populations’ diversity. Despite the strong bottlenecks, purifying selection shaped the diversity of intra-host CHV1 populations. In both transmission experiments on average, synonymous mutational diversity was higher than non-synonymous, across all replicates. Signs of positive selection or mutation accumulations, inferred by a surplus of nonsynonymous mutations, were less common and mostly observed during vertical transmission experiments, i.e. in new viral populations arising from conidia.

  • PDF Download Icon
  • Research Article
  • Cite Count Icon 6
  • 10.1371/journal.pone.0199494
Comparison of chikungunya viruses generated using infectious clone or the Infectious Subgenomic Amplicons (ISA) method in Aedes mosquitoes.
  • Jun 28, 2018
  • PLOS ONE
  • Souand Mohamed Ali + 5 more

Reverse genetics systems provide the opportunity to manipulate viral genomes and have been widely used to study RNA viruses and to develop new antiviral compounds and vaccine strategies. The recently described method called ISA (Infectious Subgenomic Amplicons) gives the possibility to rescue RNA viruses in days. We demonstrated in cell culture that the use of the ISA method led to a higher genetic diversity of viral populations than that observed using infectious clone technology. However, no replicative fitness difference was observed. In the present study, we used the chikungunya virus as a model to compare in Aedes aegypti and Aedes albopictus mosquitoes the genotypic and phenotypic characteristics of viruses produced either from an infectious clone or using the ISA method. We confirmed the results found in cellulo corroborating that the use of the ISA method was associated with higher genetic diversity of viral populations in mosquitoes but did not affect the vector competence validating its use for in vivo experiments.

  • Research Article
  • Cite Count Icon 5
  • 10.1016/j.antiviral.2017.01.001
Tracking HCV protease population diversity during transmission and susceptibility of founder populations to antiviral therapy
  • Jan 3, 2017
  • Antiviral Research
  • Tanvi Khera + 12 more

Tracking HCV protease population diversity during transmission and susceptibility of founder populations to antiviral therapy

  • Research Article
  • Cite Count Icon 22
  • 10.1128/jvi.01590-19
Rapid Dissemination and Monopolization of Viral Populations in Mice Revealed Using a Panel of Barcoded Viruses.
  • Jan 6, 2020
  • Journal of Virology
  • Broc T Mccune + 3 more

The gastrointestinal tract presents a formidable barrier for pathogens to initiate infection. Despite this barrier, enteroviruses, including coxsackievirus B3 (CVB3), successfully penetrate the intestine to initiate infection and spread systemically prior to shedding in stool. However, the effect of the gastrointestinal barrier on CVB3 population dynamics is relatively unexplored, and the selective pressures acting on CVB3 in the intestine are not well characterized. To examine viral population dynamics in orally infected mice, we produced over 100 CVB3 clones harboring nine unique nucleotide "barcodes." Using this collection of barcoded viruses, we found diverse viral populations throughout each mouse within the first day postinfection, but by 48 h the viral populations were dominated by fewer than three barcoded viruses in intestinal and extraintestinal tissues. Using light-sensitive viruses to track replication status, we found that diverse viruses had replicated prior to loss of diversity. Sequencing whole viral genomes from samples later in infection did not reveal detectable viral adaptations. Surprisingly, orally inoculated CVB3 was detectable in pancreas and liver as soon as 20 min postinoculation, indicating rapid systemic dissemination. These results suggest rapid dissemination of diverse viral populations, followed by a major restriction in population diversity and monopolization in all examined tissues. These results underscore a complex dynamic between dissemination and clearance for an enteric virus.IMPORTANCE Enteric viruses initiate infection in the gastrointestinal tract but can disseminate to systemic sites. However, the dynamics of viral dissemination are unclear. In this study, we created a library of 135 barcoded coxsackieviruses to examine viral population diversity across time and space following oral inoculation of mice. Overall, we found that the broad population of viruses disseminates early, followed by monopolization of mouse tissues with three or fewer pool members at later time points. Interestingly, we detected virus in systemic tissues such as pancreas and liver just 20 min after oral inoculation. These results suggest rapid dissemination of diverse viral populations, followed by a major restriction in population diversity and monopolization in all examined tissues.

  • PDF Download Icon
  • Research Article
  • Cite Count Icon 38
  • 10.1128/msphere.00279-16
Norovirus Polymerase Fidelity Contributes to Viral Transmission In Vivo
  • Oct 19, 2016
  • mSphere
  • Armando Arias + 4 more

Intrahost genetic diversity and replication error rates are intricately linked to RNA virus pathogenesis, with alterations in viral polymerase fidelity typically leading to attenuation during infections in vivo. We have previously shown that norovirus intrahost genetic diversity also influences viral pathogenesis using the murine norovirus model, as increasing viral mutation frequency using a mutagenic nucleoside resulted in clearance of a persistent infection in mice. Given the role of replication fidelity and genetic diversity in pathogenesis, we have now investigated whether polymerase fidelity can also impact virus transmission between susceptible hosts. We have identified a high-fidelity norovirus RNA-dependent RNA polymerase mutant (I391L) which displays delayed replication kinetics in vivo but not in cell culture. The I391L polymerase mutant also exhibited lower transmission rates between susceptible hosts than the wild-type virus and, most notably, another replication defective mutant that has wild-type levels of polymerase fidelity. These results provide the first experimental evidence that norovirus polymerase fidelity contributes to virus transmission between hosts and that maintaining diversity is important for the establishment of infection. This work supports the hypothesis that the reduced polymerase fidelity of the pandemic GII.4 human norovirus isolates may contribute to their global dominance. IMPORTANCE Virus replication fidelity and hence the intrahost genetic diversity of viral populations are known to be intricately linked to viral pathogenesis and tropism as well as to immune and antiviral escape during infection. In this study, we investigated whether changes in replication fidelity can impact the ability of a virus to transmit between susceptible hosts by the use of a mouse model for norovirus. We show that a variant encoding a high-fidelity polymerase is transmitted less efficiently between mice than the wild-type strain. This constitutes the first experimental demonstration that the polymerase fidelity of viruses can impact transmission of infection in their natural hosts. These results provide further insight into potential reasons for the global emergence of pandemic human noroviruses that display alterations in the replication fidelity of their polymerases compared to nonpandemic strains.

  • Research Article
  • Cite Count Icon 17
  • 10.1128/jvi.75.14.6729-6736.2001
Population genetic analysis of the protease locus of human immunodeficiency virus type 1 quasispecies undergoing drug selection, using a denaturing gradient-heteroduplex tracking assay.
  • Jul 15, 2001
  • Journal of virology
  • L Doukhan + 1 more

Monitoring the evolution of human immunodeficiency virus type 1 (HIV-1) drug resistance requires measuring the frequency of closely related genetic variants making up the complex viral quasispecies found in vivo. In order to resolve both major and minor (>/=2%) protease gene variants differing by one or more nucleotide substitutions, we analyzed PCR products derived from plasma viral quasispecies by using a combination of denaturing gradient gel electrophoresis and DNA heteroduplex tracking assays. Correct population sampling of the high level of genetic diversity present within viral quasispecies could be documented by parallel analysis of duplicate, independently generated PCR products. The composition of genetically complex protease gene quasispecies remained constant over short periods of time in the absence of treatment and while plasma viremia fell >100-fold following the initiation of protease inhibitor ritonavir monotherapy. Within a month of initiating therapy, a strong reduction in the genetic diversity of plasma viral populations at the selected protease locus was associated with rising plasma viremia and the emergence of drug resistance. The high levels of protease genetic diversity seen before treatment reemerged only months later. In one patient, reduction in genetic diversity at the protease gene was observed concomitantly with an increase in diversity at the envelope gene (E. L. Delwart, P. Heng, A. Neumann, and M. Markowitz, J. Virol. 72:2416-2421, 1998), indicating that opposite population genetic changes can take place in different HIV-1 loci. The rapid emergence of drug-resistant HIV-1 was therefore associated with a strong, although only transient, reduction in genetic diversity at the selected locus. The denaturing gradient-heteroduplex tracking assay is a simple method for the separation and quantitation of very closely related, low-frequency, genetic variants within complex viral populations.

  • PDF Download Icon
  • Research Article
  • Cite Count Icon 11
  • 10.1371/journal.pone.0189554
Fitness peaks of dengue virus populations
  • Jan 2, 2018
  • PLOS ONE
  • Wen Jun Liu + 1 more

The role of intra-host genetic diversity in dengue viral populations remains a topic of debate, particularly the impact on transmission of changes in this diversity. Several approaches have been taken to increasing and decreasing the genetic diversity of populations of RNA viruses and have drawn what appear to be contradictory conclusions. A 2–6 fold increase in genetic diversity of a wild type population of dengue virus serotype 1(DENV1) and of an infectious clone population derived from the wild type population, produced by treatment with nucleotide analogue 5 fluorouracil (5FU), drove the populations to extinction. Removal of 5FU immediately prior to extinction, resulted in a return to pre-treatment levels of fitness and genetic diversity, albeit with novel single nucleotide polymorphisms. These observations support the concept that DENV populations exist on fitness peaks determined by their transmission requirements and either an increase or a decrease in genetic diversity may result in a loss of fitness.

  • PDF Download Icon
  • Research Article
  • Cite Count Icon 82
  • 10.1186/1471-2156-10-84
A simple method for estimating genetic diversity in large populations from finite sample sizes
  • Dec 1, 2009
  • BMC Genetics
  • Stanislav Bashalkhanov + 2 more

BackgroundSample size is one of the critical factors affecting the accuracy of the estimation of population genetic diversity parameters. Small sample sizes often lead to significant errors in determining the allelic richness, which is one of the most important and commonly used estimators of genetic diversity in populations. Correct estimation of allelic richness in natural populations is challenging since they often do not conform to model assumptions. Here, we introduce a simple and robust approach to estimate the genetic diversity in large natural populations based on the empirical data for finite sample sizes.ResultsWe developed a non-linear regression model to infer genetic diversity estimates in large natural populations from finite sample sizes. The allelic richness values predicted by our model were in good agreement with those observed in the simulated data sets and the true allelic richness observed in the source populations. The model has been validated using simulated population genetic data sets with different evolutionary scenarios implied in the simulated populations, as well as large microsatellite and allozyme experimental data sets for four conifer species with contrasting patterns of inherent genetic diversity and mating systems. Our model was a better predictor for allelic richness in natural populations than the widely-used Ewens sampling formula, coalescent approach, and rarefaction algorithm.ConclusionsOur regression model was capable of accurately estimating allelic richness in natural populations regardless of the species and marker system. This regression modeling approach is free from assumptions and can be widely used for population genetic and conservation applications.

Save Icon
Up Arrow
Open/Close
  • Ask R Discovery Star icon
  • Chat PDF Star icon

AI summaries and top papers from 250M+ research sources.