Abstract

In this increasingly data-rich era of scientific enquiry, attention is turning to the value of archiving information in the form of publicly-available, searchable databases, to permit future analyses without the prior need for acquisition of additional data. The establishment of repositories is facilitated by the continuing decrease in computational overhead enshrined in Moore's law [1], and therefore offers considerable and increasing value in situations where sample collection is rate-limited, due to costs, legal restrictions, unavailability of collection expertise, or other factors. Such a situation exists in flow cytometric analyses of plant genome sizes. Including those species yet to be discovered, world-wide we estimate approximately 330,000–400,000 different species of flowering plants [2], a very small percentage of which have been characterized with any degree of sophistication, whether at the ecological, agronomic, morphological, physiological, molecular, or cellular level. Given the impact of the Anthropocene, increasing rates of species extinction offers the real risk that we will lose entire plant species before their currently unsuspected importance is discovered [3]. Archiving specimens and the data already derived from them provides one small way in which we can address this problem, through providing a molecular-taxonomic inventory of all extant plant species. However, how representative and reliable are the stored measurements? A significant problem with archiving data for future use relates to unforeseen and unrecorded variables. As cytometrists, our interest naturally focuses on measurement variables, and on the performance of the associated instrumentation and methods of measurement. Our example here is the measurement of plant genome sizes, defined operationally as the 2C nuclear DNA content, in pg, of somatic cells. This information has become readily accessible using flow cytometry [4] and, with global spread of this particular application, the number of publications reporting plant genome sizes has rapidly increased. Recognizing this, archives of genome size measurements emerged, one of the most comprehensive of which is the RBG Kew Plant C-value database [5]. This database contains a compilation of genome size values harvested from the primary scientific literature, representing the contributions of very many different laboratories located around the world. These values are sorted in terms of measurement methods (most involve FCM), presented in the form of ranges and global average values, and manually curated in terms of “prime values” for each species, as well as other estimates also reported for the species if available. Prime values were originally described as those which represent “the most consistent value obtained under best-practice methods (as originally defined by Bennett & Smith, 1976)” [6]. Since this reference predates the use of flow cytometry, the criteria for defining prime values have been adapted and refined empirically (Leitch IJ: Pers. Commun.). Thus, the following questions are asked: (a) Did the study follow best practices—that is, what DNA-specific staining procedure was employed, were there a suitable number of replicates, good CVs, and internal calibration, and was a suitable reference standard used (i.e., a species having a genome less than three times the size of the species of interest)? (b) Were chromosome counts made on the same materials used to estimate the genome size? (c) What is the reputation of the laboratories generating the data—and if so, how reliable are they considered to be? (d) Does a herbarium voucher exist of the analyzed material? In general, estimates made using flow cytometry have been considered to be “more reliable”, and hence were, and continue to be, selected in preference to those estimated using Feulgen microdensitometry. Nevertheless, over time it became increasingly hard to distinguish between the estimates due to the comparably high quality of data coming from an increasing number of groups. Thus, for stability of the lists, if a species had previously been assigned to be a prime estimate, then this was retained unless there was a compelling reason to change it. Currently, the Kew Plant C-value database (release 7.1; https://cvalues.science.kew.org/) contains data for 12,273 species, comprising 10,770 angiosperms, 421 gymnosperms, 303 pteridophytes (246 ferns and fern allies and 57 lycophytes), 334 bryophytes, and 445 algae. The availability of this data now makes it possible to directly test the relationship between the crowd-sourced prime genome size values, and measurements made in a single laboratory under controlled conditions. This effectively examines whether the curated, crowd-sourced data (genome size values within the C-value database measured in many different laboratories, using different instruments and experimental conditions, and calibration standards with different assumed genome sizes for converting relative measurements into absolute amounts) is a useful permanent record. To test this, we employed the Beckman Coulter Cytoflex for the analysis of homogenates of four species, staining with propidium iodide/ribonuclease [7] (for methods specific to the CytoFlex, see https://www.beckman.com/resources/reading-material/application-notes/plant-genome-size-flow-cytometry-analysis also provided as Appendix S2). The Kew Plant C-value prime values for the four species span a range of 0.32 pg DNA (the 2C value for A. thaliana leaf cells) to 101.12 pg (the 32C value for endoreduplicated pericarp cells of Capsicum annuum); about 95% of the plant nuclear DNA content 2C-values in the Kew database fall between 0.32 and 101.12 pg. Figure 1 illustrates the experimental strategy with arabidopsis: a parametric plot of side scatter versus PI (area) fluorescence (Figure 1A) reveals five clusters of nuclei, equally spaced across the PI-dimension (log scale). Figure 1B illustrates the time-dependency of PI fluorescence, gating on Region P1 of Figure 1A. Further gating (Region P2, Figure 1B) to include only those nuclei whose fluorescence remains constant over time provides the uniparametric histogram of the individual classes of endoreduplicated nuclei (Figure 1C). Table 1 lists summary values for the fluorescence values representing the positions of these peaks, the associated CVs, and the corresponding prime DNA content values taken from the Kew Plant C-value database. From these experiments, we make a number of observations: First, the CV values for these peaks are remarkably low and consistent, in all cases <2%, even for the largest nuclear genome (1.19%; pepper, 101.12 pg, 32C). Second, endoreduplication in arabidopsis, spanning a low range of DNA contents (0.32–2.56 pg) and in pepper, spanning a higher range (6.32–101.12 pg), results in an almost perfect linear correspondence between peak positions and DNA content (r2 values of 1.0 and 0.9999 respectively) (Figure 2). Finally, the combined regression analysis between all species, peak positions, and DNA content values from the Kew database also provides an almost perfect correspondence (r2 = 0.9997). The line of best fit intersects the origin at (0,0) which indicates an absence of systematic error in the measurements, and no unusual species-specific deviations are noted. A number of conclusions can be drawn: (1) In these species, endoreduplication results in the precise and complete duplication of the nuclear genome, through at least three (arabidopsis) or four (pepper) endocycles. It should be recognized that, in a very small minority of plant groups, such as orchids, partial endoreduplication is seen; further details of this interesting phenomenon can be found in Trávníček et al. [8]. Although rarely encountered, care should be taken to accommodate its potential occurrence. (2) Nuclei occupying different size classes due to endocycling are quantitively measured by the cytometer, meaning that geometric interactions between the size of the nucleus and the height of the illumination spot are not a factor affecting measurement efficiency. We can predict nuclear sizes, assuming a spherical shape (which will most likely be the case after the nuclei are isolated) based on the reported volume of Arabidopsis nuclei (presumptively 2C) being 32 cu. μm [9]; from this, r = 2.88 μm. Assuming nuclear size scales linearly with DNA content, the largest endoreduplicated nuclei will occupy a volume of 32 x 101.12/0.32 = 10,112 cu. μm, predicting a sphere of radius 13.4 μm. The Cytoflex illumination beam (spot size: 5 μm × 80 μm) therefore can be considered to act in slit-scanning mode for these larger nuclei, emphasizing the importance of acquiring area and not height measurements from the pulse-waveforms corresponding to the individual nuclei. It would be interesting to independently confirm the predicted sizes of endoreduplicated nuclei using fluorescence microscopy or, better perhaps, image cytometry. (4) Since the scaling of DNA content with endocycle status is highly compelling, this further implies that differences in chromatin packing, otherwise shown to influence flow cytometric DNA measurements using PI [10] are insignificant across the endocycles and the species measured here. Parenthetically, this confirms earlier reports of linearity of duplication of plant genomes accompanying endocycles or autopolyploidization [7, 11, 12]. (5) Remarkably, the precise degree of scaling between the measured DNA contents of a single run, using the Cytoflex instrument and identical experimental conditions, and those values identified as “prime” in the Kew database (Table 1), implies the process of crowd-sourcing plant genome size measurements, across many laboratories, has converged on a meaningful relationship. This enhances the value of the content of the Kew Plant C-value database. (6) That the prime values in the database are not assigned to varieties, lines, or cultivars argues that the 2C nuclear DNA contents of the species selected for analysis here must be the same as, or very close to, these prime values. Taken together, these observations lend confidence to the concept of data repositories, but only in the context of plant nuclear DNA content measurement using flow cytometry. The caveat exists that, since in all situations and for all databases, similar strategies for evaluating the “quality” of the stored data must be devised to satisfy “Best Practices”, this may not turn out to be possible in all cases. An obvious issue is the uncovering ex post facto of critical variables that had not been recorded in the databases and that now are no longer available. For further discussions of Best Practices in Plant Cytometry, please visit Galbraith et al. [13], and references cited therein. On-going support was provided by the US Department of Agriculture through the University of Arizona College of Agriculture and Life Sciences. The peer review history for this article is available at https://publons.com/publon/10.1002/cyto.a.24493. Appendix S1. Cytometry Part A. Author Checklist: MIFlowCyt-Compliant Items. Appendix S2. Supporting Information. Please note: The publisher is not responsible for the content or functionality of any supporting information supplied by the authors. Any queries (other than missing content) should be directed to the corresponding author for the article.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call