Abstract

Fungal barcoding, that is the use of genetic markers to identify fungal species, has contributed enormously to the rise of mycorrhizal research in the last decade (van der Heijden et al., 2015) because it allows quick and easy en masse identification of species or higher taxonomic ranks and grouping of sequences into entities; this speeds up ecological analyses and the discovery of new species (Hibbett et al., 2011; Herr et al., 2015). Marker sequences allow the delineation of operational taxonomic units (OTUs) on the basis of sequence similarities (Lindahl et al., 2013). Specific cases of OTUs delimited across datasets are internal transcribed spacer (ITS)-based fungal Species Hypotheses (SH; Kõljalg et al., 2013) and small subunit (SSU) rRNA gene-based Glomeromycota Virtual Taxa (VT; Öpik et al., 2014), which allow the stable naming and assembly of databases of sequence-associated ecological metadata (SH; UNITE, Kõljalg et al., 2013; VT; MaarjAM, Öpik et al., 2010). Such OTUs allow us to answer ecological questions, such as the exploration of diversity patterns between habitats, hosts or experimental treatments (Merckx et al., 2012; Ohsowski et al., 2014), and to describe distribution ranges and large-scale patterns (Tedersoo et al., 2014; Davison et al., 2015). Moreover, the phylogenetic signal conveyed by the sequences can detect phylogenetic diversity patterns and identify phylogenetic signals in functional traits of organisms and communities (Martos et al., 2012; Grilli et al., 2015). Mycorrhizal fungal research often uses the nuclear ribosomal ITS (Box 1), now the most acknowledged taxonomic barcode for fungi (Schoch et al., 2012), especially in ectomycorrhizal (ECM) models. Molecular ecological research on Glomeromycota, the arbuscular mycorrhizal (AM) fungi, uses a broader range of markers, among which the nuclear SSU rRNA gene is most often used in community surveys, whilst the ITS and large subunit (LSU) rRNA genes are more common in taxonomic research into Glomeromycota (Öpik et al., 2014). Alternative markers, such as the beta-tubulin gene, COI or RPB1 genes, are also being tested (reviewed in Öpik et al., 2014 and Hart et al., 2015). The fungal ribosomal DNA (rDNA) locus contains tens to hundreds of tandem repeats each of which encodes the four ribosomal RNA genes separated by spacers. Within each repeat, the internal transcribed spacer (ITS) is located between the small-subunit (18S rRNA) and large-subunit (28S rRNA) ribosomal RNA genes and includes the 5.8S rRNA gene. The multi-copy structure of rDNA facilitates its amplification by PCR, and makes it very suitable for taxonomic barcoding, but raises questions about its behaviour as a single, formal Mendelian genetic marker. Indeed, a mutation in one repeat or a meiotic crossing-over within the rDNA locus may result in heterogeneous repeats on the same chromosome, as observed in some fungi (Lindner et al., 2013). But in most cases a mechanism called concerted evolution acts to homogenize the repeats within a chromosome (Eickbush & Eickbush, 2007; Ganley & Kobayashi, 2011). Although poorly investigated, concerted evolution probably acts by way of unequal mitotic crossing-overs between sister-chromatids (Naidoo et al., 2013), and so amplifies one random repeat variant at the expense of the others on the chromosome resulting in removal of the repeat heterogeneity. Concerted evolution homogenizes repeats within a chromosome, so that homogeneous ITS sequences can be obtained from haploid strains even in polymorphic populations (Riccioni et al., 2008). Yet concerted evolution cannot act between homologous chromosomes, because vegetative nuclei remain haploid in Ascomycota and Basidiomycota (even in dikaryotic cells where by definition haploid nuclei are separated). Thus, rDNA heterozygotes can be stable and segregate in a Mendelian way at meiosis (Selosse et al., 1996; Martin et al., 1999), with the possibility of crossing-over within the rDNA locus creating transiently heterogeneous repeats. The Mendelian behaviour of rDNA, in spite of its repetitive structure, has allowed its use as a marker for population genetics, for example ITS polymorphism among individuals of a population as a neutral marker to study population structure (Riccioni et al., 2008) or the polymorphic intergenic spacer (IGS) between the 28S and 5S rRNA genes to study Hardy–Weinberg equilibrium (Roy et al., 2008; Vincenot et al., 2012). In some species, relaxation of concerted evolution (which allows heterogeneous repeats) or duplication of the rDNA locus (which disrupts concerted evolution) can occur: this leads to apparent heterozygosities or even more complex genetic patterns (Eickbush & Eickbush, 2007; Ganley & Kobayashi, 2011). Whenever an excess of rDNA homozygotes is observed in outcrossing Basidiomycota populations (as in the main text), concerted evolution or its relaxation cannot explain it. We propose that the way sequences are produced and corrected for taxonomic barcoding is masking some heterozygosities. While the data processing that delivers the final sequence in the International Nucleotide Sequence Database (INSD) is often well suited to the tasks outlined earlier, we suggest that it sometimes fails in terms of biological relevance. The way data are obtained, analysed (by algorithms and pipelines), manually curated and assembled for deposition in INSD can sometimes mask or recombine the original genetic diversity, whose preservation would be extremely useful in other studies on fungal biology. Sequence data in INSD represent hypotheses, not facts, and we suggest that it is possible to increase the biological relevance of such data. Two major examples follow, considering, respectively, ECM and AM fungi: we describe how part of the fungal biology is masked, especially intraspecific sequence diversity and hallmarks of sexual recombination, and we outline possible solutions. ITS sequences are part of the ribosomal DNA locus that behaves as a single Mendelian locus (Box 1). ITS-based fungal OTUs are most commonly defined at the 97% sequence similarity threshold (Lindahl et al., 2013): although this also accounts for sequencing errors, it acknowledges the existence of ITS polymorphism within most fungal species. The 97% threshold is debatable in value, but not really in its essence. Moreover, in an elegant approach, Hughes et al. (2009) identified how divergent the ITS region was in biological species (i.e. species defined as a group of organisms capable of successful interbreeding) of Basidiomycota by analysing heterozygosity in ITS sequences of dikaryotic fruiting bodies. They found that the divergence between alleles from a given fruiting body was most of the time under 3%, so that more divergent alleles were from non-recombining taxa. Most biological species can thus be defined using the 97% similarity threshold, at least within Basidiomycota. Moreover, heterozygosity should be observed in some individuals, and heterozygotes should occur at a certain frequency within populations, at least for outcrossing dikaryotic Basidiomycota (this has fewer implications for autogamous species and Ascomycota, which are generally haploid at the vegetative stage). Strikingly, most ITS sequences of Basidiomycota deposited in INSD show no ambiguous, heterozygous positions. ITS sequences from single species samples such as fruiting bodies or fungal cultures are, so far, mostly obtained by Sanger sequencing directly applied to PCR products, that is, without a cloning step to separate different amplicons, if present. We propose that the lack of heterozygous positions in Basidiomycota is partly due to the use of automated algorithms that de-convolute raw chromatography data and ‘call’ nucleotides to produce the chromatogram accessible to the annotators: default settings do not aim at detecting heterozygosities and can exclude some ambiguities. Moreover, although we do not mean that heterozygosity is deliberately masked by manual corrections, scientists sometimes see ‘ambiguous’ bases (Y, W, …) as a quality problem. However, automated or not, corrections of ‘ambiguities’ can mask the natural genetic variation occurring in the studied organisms, and the opportunity to store such valuable information for the scientific community is missed. Indeed, population genetics studies have shown that many ECM basidiomycetes are randomly outcrossing (panmixia; Douhan et al., 2011), meaning that heterozygosity should not be rare when ITS is polymorphic, according to the Hardy–Weinberg equilibrium hypothesis. We analysed ITS sequence data retrieved from INSD for populations of three basidiomycetes that show strong cues for panmixia on the basis of other loci, namely Laccaria amethystina (Vincenot et al., 2012), Amanita phalloides (Pringle et al., 2009), and Tricholoma populinum (Grubisha et al., 2012). For each population from specific geographic regions, we calculated the diversity of ITS alleles and the observed frequency of heterozygotes (Table 1). We then compared it with the expected frequency of heterozygotes, assuming panmixia, using two mutation models. We observed a clear deficit of heterozygotes in INSD among most populations of the investigated fungal species (Table 1). The expected heterozygote frequencies ranged from slightly lower than the observed frequency (T. populinum in Scandinavia) to exceeding the observed levels 19-fold in the case of L. amethystina investigated by two authors of the present Letter (Vincenot et al., 2012). Actually, no heterozygous sequence exists in INSD for A. phalloides. We carefully re-analysed chromatograms still available for L. amethystina and A. phalloides. Twelve ITS chromatograms from Pringle et al. (2009; kindly provided by the authors) revealed no heterozygote individual in North American A. phalloides (Table 1). Since this species is introduced and invasive in North America, we suspect that a founder effect may reduce genetic diversity and enhance inbreeding, and indeed few ITS alleles were present. Eight ITS chromatograms from Vincenot et al. (2012) revealed a higher frequency of heterozygotes than in the data deposited in INSD (Table 1) for L. amethystina in Europe; even though the observed heterozygote frequency was still twice as low as the expected one, perhaps due to the low number of chromatograms available for re-analysis. Finally, more insidious technical reasons may account for missing heterozygotes, such as unequal amplification of the two alleles during PCR, for example due to different lengths: the masking can arise from the molecular sequencing technology itself. Thus, there is evidence that sequence editing can mask the hallmark of panmixia, creating an apparent excess of homozygotes in INSD for Basidiomycota. To avoid this, one may allow for more heterozygosity during the base calling process of raw sequencing chromatograms from Sanger sequencing. We recommend the use of software such as Champuru (Flot, 2007) that is designed to extract allelic sequences from chromatograms of heterozygotes. When this is not possible, the fact that sequences were corrected by forcing for homozygosity should be made available in the annotations of the deposited sequences, so that such sequences can be excluded from population genetics analyses. In the near future, the use of next-generation sequencing (NGS) will obviously solve the problem by providing access to the sequences of the two alleles in the case of heterozygosity, since NGS approaches sequence the DNA molecules individually unlike Sanger sequencing. Importantly, we call for each allelic sequence to be deposited in the public depositories and we encourage authors to annotate their sequences regarding the sequencing platform and nucleotide calling software and assumptions, to allow future population genetics analyses. AM fungi possess high intrasporal and hence intra-hyphal genetic variation in marker genes and in the genome in general (Young, 2015). The underlying reasons for this remain a matter of debate, however: it may be due either to the multiple divergent nuclei per spore/individual, or to divergent alleles in nuclei (Young, 2015). Whichever is the case, it is important to capture such variation within and among AM fungal isolates. Therefore, when characterizing a new Glomeromycota species by sequencing PCR products of marker genes from cultured spores, it is necessary to separate the sequence variants within the amplicon pool and to sequence them individually. This can be done by cloning followed by Sanger sequencing or by NGS approaches (see earlier). The same approaches can be used for detection and identification of AM fungi in environmental samples (Hart et al., 2015). However, a widespread practice in AM fungal taxonomic studies is to report a single representative sequence per species or per culture. Such single sequences submitted to a public repository may be a randomly selected sequence from direct sequencing of PCR products, a concatenation of multiple flanking clones joined by conserved portions of a longer amplicon, or a consensus of multiple divergent versions of an amplicon condensing the variable nucleotide positions as degenerate or consensus bases. The delineation of actual alleles is then made impossible. The latter two practices are also sometimes used at an alignment and phylogenetic analysis phase, to generate reference sequence sets, as done by Krüger et al. (2012) for instance. Although the latter work is an invaluable resource for ecological research and barcoding of mycorrhizal fungi, the risk is that such practices mask information about the biology and natural variation of AM fungi in the following ways. First, a single representative sequence per culture, species or OTU does not contain information about intraspecific genetic variation, although such information was often obtained. Therefore, species ‘boundaries’ cannot be properly estimated. Second, due to the known high intrasporal sequence variation in the commonly used AM fungal marker genes (Thiéry et al., 2012), a concatenation of individual flanking clones has a high probability of joining sequence fragments originating in different haplotypes, thus generating a chimeric sequence. Parts of such sequences do indeed exist in the source organism, but the biological relevance of the entire composite sequence is questionable. Third, generation of consensus sequences would smooth out the polymorphisms present in the variable positions, as would PCR product sequencing and provision of the resulting sequences without information about polymorphic positions that would be detectable from the chromatograms, or by separately sequencing individual variants upon cloning or by NGS. Why is information about intraspecific variation of AM fungi so important? It is needed to develop guidelines for appropriate species delimitation in taxonomic studies, for automated or manual OTU delimitation in diversity surveys, and therefore for the best possible understanding of the DNA-based diversity of AM fungi (Hart et al., 2015). To illustrate these points, we aligned the SSU rRNA gene, ITS and LSU rRNA gene (Sanger) sequences of selected Diversisporaceae from the MaarjAM database (downloaded 30 July 2015), together with the consensus sequences from the reference sequence sets of Krüger et al. (2012; available at http://schuessler.userweb.mwn.de/amphylo/), and representatives of ITS-based SH from UNITE database version 7 (Kõljalg et al., 2013). The phylogenetic analyses performed on these alignments (Fig. 1; Supporting Information Fig. S1) clearly show that there is a wealth of sequence variation present in isolates and species within these common Glomeromycota barcoding marker regions. Information about intraspecific sequence variation permits appropriate assignment of environmental sequences to a species, VT, OTU or SH, whilst lack of it may result in lumping or splitting biologically informative taxa. Moreover, sequence variability within or among loci can be used to test for the presence of recombination. This is a crucial question in AM fungi for which sexuality remains undiscovered, though some components of the recombination toolbox have been reported (Tisserant et al., 2013; Riley et al., 2014). Concatenating DNA fragments originating from different individuals or nuclei creates the hallmark of biological recombination and masks potential evidence for asexuality (lack of recombination) in AM fungi. Owing to the earlier mentioned reasons, when working with AM fungi we recommend the sequencing of multiple clones (or molecules in the case of NGS) of any marker per spore, several spores per culture, several cultures per species, and to submit to sequence databases representative sequences of each sequence variant when detected. In ecological studies, submission of multiple sequences per OTU or VT is recommended to report and display sequence variation within them (Hart et al., 2015). In automated OTU generation, it is wise practice to validate multiple representative sequences per OTU/VT to ensure appropriate taxonomy assignment of the OTUs (Lindahl et al., 2013) and to gain understanding of within-OTU variation. It is very important that future NGS analysis pipelines include the option of extracting such representative sequences. Current research on fungal diversity comprises an increasing number of increasingly complex steps, such as molecular laboratory procedures (cloning in AM fungi) or data processing steps (e.g. chromatogram interpretation after Sanger sequencing or OTU-picking in NGS bioinformatics pipelines) that are often automated. We should be aware that some ‘facts’ resulting from these steps, such as sequences or OTUs, are no longer exactly facts, but already interpretations of the original raw data. Furthermore, these interpretations can mask some biological facts such as the presence of heterozygosities, or genetically divergent nuclei in a cytoplasm or of divergent alleles in a nucleus. The evidence for occurrence (or absence) of sexual recombination can be obliterated. The way we report scientific data should provide as much raw data as possible, and as a minimum program carefully describe the filters and analytical procedures used to generate the data. Submission of raw or even intermediate data files (Drew et al., 2013) is now required by many journals, and we recommend this for sequences as well. We must avoid handling of barcoding data that erases the biological facts that contemporary and future scientists who pursue different biological questions and problems may wish to investigate. M-A.S. is supported by the Muséum national d'Histoire naturelle and the Fondation de France; M.Ö. is supported by the Estonian Research Council (grants 9050, IUT20-28). The authors thank Anne Pringle, Ian Dickie, Nhu Nguyen and an anonymous reviewer for their comments on this paper. M-A.S. and M.Ö. planned the first version of the paper, and then elaborated a second version after discussions with L.V., who suggested and made analyses for Table 1. M.Ö. made the analyses for Fig. 1. Please note: Wiley Blackwell are not responsible for the content or functionality of any supporting information supplied by the authors. Any queries (other than missing material) should be directed to the New Phytologist Central Office. Fig. S1 Maximum likelihood phylogenetic trees of Fig. 1 showing full details of the sequence names. Notes S1 Alignment of SSU rRNA gene sequences used to compute Fig. 1(a). Notes S2 Tree file of Fig. 1(a). Notes S3 Alignment of ITS sequences used to compute Fig. 1(b). Notes S4 Tree file of Fig. 1(b). Notes S5 Alignment of LSU rRNA gene sequences used to compute Fig. 1(c). Notes S6 Tree file of Fig. 1(c). Please note: The publisher is not responsible for the content or functionality of any supporting information supplied by the authors. Any queries (other than missing content) should be directed to the corresponding author for the article.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call