Abstract

Even in the molecular genomics era, as the flood gates open to deliver a deluge of DNA sequence data, knowledge of the total amount of DNA that comprises an organism's genome (= genome size) remains vital for many diverse fields of biology. Indeed, it may be considered to be a key biodiversity character with both practical and biological consequences for the organism. Knowledge of genome size is important, for example, in ongoing and planned sequencing programs to assess costs and to know how much DNA to sequence, or for determining how many large insert clones (e.g., bacterial artificial chromosomes [BACs] or fosmids) are needed for constructing genomic libraries. In addition, various genetic fingerprinting tools (e.g., microsatellites and AFLPs) have been shown to be sensitive to genome size so knowledge of this character is important before embarking on such studies. From a biological perspective variation in genome size has been shown to have diverse yet predictable consequences for an organism, influencing, for example, how it will respond to changes in carbon dioxide, rising temperatures, and pollution (1). While people have been estimating genome sizes in plants and animals for over 50 years, the last decade has seen a huge growth in the number of estimates published, not only to provide data for molecular studies but also for large scale comparative analyses seeking to understand the biological and evolutionary significance of the 40,000-fold range of genome sizes encountered across eukaryotes (2). Indeed, data are now available for over 10,000 species, accessible through the internet in three databases (Animal Genome Size Database—www.genomesize.com; Fungal Genome Size Database—www.zbi.ee/fungal-genomesize; and the Plant DNA C-values Database—http://data.kew.org/cvalues). Nevertheless, as Doležel and Greilhuber point out in their paper “Nuclear genome size: Are we getting closer?” (3), despite this surge in data generation it has become increasingly clear that estimating genome size in absolute units (either picograms (pg) of DNA or number of base pairs (bp), usually Mbp) is still far from trivial, with numerous factors influencing the reliability of the data. The problem has been particularly exacerbated by the replacement of Feulgen microdensitometry with flow cytometry (FCM) as the method of choice for genome size measurement (4), especially in plants where 80% of estimates made since 2000 used FCM. This is because FCM is often perceived to be a quick, easy, and reliable method and hence, in many cases, has been used uncritically, resulting in the generation of dubious data. Fortunately, the last decades have also seen carefully conducted research into the application of FCM for genome size studies and this has revealed several important factors which must be carefully considered, both for sample preparation and in the interpretation of the results, if reliable FCM data are to be generated (4, 5). Some of these issues, which are discussed by Doležel and Greilhuber, have been properly settled in recent years including: (i) widespread agreement on the conversion factor to use when changing between picograms and megabase pairs, i.e., 1 pg = 978 Mbp (6), (ii) refinement of terminology to avoid ambiguity (7), and (iii) the recognition that only intercalating fluorochromes are suitable for genome size studies of which propidium iodide and ethidium bromide are the most widely used (4). However, other issues remain unresolved and the focus of ongoing work. For example, it is now well recognized that cytosolic compounds can act as staining inhibitors, affecting the stoichiometry of fluorochrome binding to DNA, and hence, the accuracy of genome size estimates. However, solutions are less forthcoming. Over the years, over 50 buffers or buffer modifications in plants (8) and animals (9) have been developed in an attempt to overcome such problems. Nevertheless, there is clearly still much work ahead to optimize buffer composition so that it fully protects the DNA, avoids the negative effect of staining inhibitors, and makes the DNA-fluorochrome complex stable for a sufficient period of time to enable reliable genome size estimates to be obtained. Another aspect in which we are still profoundly ignorant is the effect of using different tissue types (e.g., leaves versus seeds in plants and blood cells versus hepatocytes in animals) as a consequence of different DNA compaction, etc. Yet, this is an area which clearly needs to be addressed, particularly as FCM researchers are increasingly extending the range of tissues used, often in an attempt to overcome problems arising from cytosolic inhibitors. Such studies are essential if meaningful insights are to be gained from comparative studies between species whose genome sizes have been estimated from different tissues. Probably one of the most important issues currently facing the field of genome size research is standardization. There is now no doubt that internal standardization (i.e., coprocessing an unknown sample with the reference material) is the sole option for genome size studies (4). However, the lack of agreement between genome size practitioners as to which species to use for calibration reference standards and what genome size to assume for converting relative units into absolute DNA amounts is a serious and controversial issue. Indeed, it is likely that this issue alone has contributed much more to the artefactual genome size variation apparent in the FCM literature than any other factor, with the possible exception of bias introduced because of the presence of interfering secondary metabolites. For example, while the use of alternative isolation buffers can result in differences in genome size estimates, these are usually small, even for challenging plant samples [<4% our own data, 8% in bryophytes (10)]. In contrast, the genome size assigned to a reference species can differ by more than 3-fold [e.g., Arabidopsis thaliana (8)], leading to apparent differences in the calculated genome size depending on what value is used for conversion. The seriousness of the problem is dealt with in detail by Doležel and Greilhuber (3) who outline the issues at stake and suggest possible ways forward for ongoing genome size research. The ideal situation for standardizing the field would be to have one calibration standard (i) which meets the criteria outlined in Table 1, (ii) whose genome size has been accurately determined chemically or by sequencing techniques and agreed upon, and (iii) is suitable for both animal and plant studies. However, it is clear from both a biological and methodological perspective that this is impossible. First, genome size varies extensively across many animal and plant groups [e.g., Chlorophyta (green algae)—2,300-fold; angiosperms—c. 2,000-fold; crustaceans—460-fold; fishes—380-fold; flatworms—340-fold (1, 11)]. A single standard is, therefore, unsuitable for estimating genome sizes in all species of such groups because of problems of nonlinearity in the flow cytometer when samples with large differences in genome size are coprocessed. Ideally, the 2C peak of the target species should be located between the 2C and 4C peaks of the internal reference standard and in no case should the standard and sample differ by more than ∼4-fold in genome size (note that the threshold for linearity guaranteed by the flow cytometer manufacturers for samples differing 2-fold, e.g., G2/G1 nuclei, varies around 1.98–2.02). Second, the suitability of using animal standards to estimate genome sizes in plants and vice versa has been called into question, and indeed in plants it is recommended that this should not be done (5). Third, the low number of species whose genome sizes have been accurately determined using chemical methods. In the plant community there is general agreement that the 2C-value of Allium cepa (onion), estimated using chemical approaches, is 33.5 pg (12). However, given that most angiosperms have considerably smaller genomes (the median 2C-value for 6,425 species is 5.0 pg) Allium is an unsuitable calibration standard for the majority of species. It is certainly clear that in the near future no help in the accurate determination of standards' genome size values can be expected from any of the “complete” genome sequencing programs. This is because the concept of a “complete” genome sequence often reflects what can be sequenced rather than what needs to be sequenced to cover all the DNA in the nucleus. For example, out of the nine “completely sequenced” plant genomes, all have noted gaps in their sequencing scaffolds. These are considered to reflect regions of repetitive DNAs that are difficult to sequence and align. In some cases only the euchromatic portion of the genome is reported to be sequenced and assembled into scaffolds (e.g., Populus trichocarpa) leaving up to 30% of the genome unsequenced or unassembled and presumed to correspond to heterochromatic DNA. In others no attempt has been made to estimate a genome size from the sequence data. Instead the amount of DNA sequenced is based on a prior estimate of genome size made using flow cytometry (e.g., Cucumis sativa, Vitis vinifera, Zea mays). The situation in animals is more controversial with various genome size estimates reported for human (Homo sapiens) made using chemical approaches (2C-values range 6.0–7.0 pg, with no mention of whether the samples analyzed were male or female even though the presence of sex chromosomes will lead to genome size differences between the sexes). In many studies a 2C-value of 7.0 pg has been used although, as Doležel and Greilhuber (3) note, this is likely to be an overestimate. Nevertheless, in contrast to plants, there is more optimism that the great efforts being made to close the remaining gaps in the assembled “completely sequenced” genome will result in a precisely known size of the human genome within a reasonable time period. As the need for calibration standards whose genome size falls close to the species of interest has increased in importance, several plant and animal genome size researchers have used Allium or human as a primary standard to measure a selection of secondary standards in a cascade-like manner (see Table 2 for the list of the most widely used plant and animal standards). This has produced some stability in the field, with many researchers adopting the values given, for example, by Doležel et al. (13) for plants and those of Tiersch et al. (14) for animals. Nevertheless, increasingly researchers have made their own reference standards by calibrating them against another species whose genome size has already been estimated. This has led to a huge expansion in the number of plant and animal species used for calibration. Indeed from the Plant and Animal Genome Size databases (aforementioned) over 50 plant and 70 animal species have been used with a wide range of assumed DNA amounts (8). Clearly this situation is untenable given the knock-on consequences of assuming different DNA amounts for a calibration standard noted above. Doležel and Greilhuber (3), therefore, suggest a sensible and appropriate holding solution to bring some stability to the field—“the logical strategy is to calibrate a primary ‘gold’ reference standard and then perform a series of experiments to calibrate secondary reference standards in an ordered sequence.” They recommend using Homo sapiens (human) with a 2C-value of 7.0 pg as the overall primary “gold” standard together with Pisum sativum (garden pea) as the primary plant standard using 2C = 9.09 pg (based on calibrating it against human with 2C = 7.0 pg). While Doležel and Greilhuber accept that the value for human may be too high, the possibility to link animal and plant studies in this way opens up the opportunity to make realistic comparisons across eukaryotes. Further, once the genome of human is truly completely sequenced then all values can be adjusted accordingly. As for which species to adopt as the secondary standards, there are many to choose from but a number of commonly used ones are listed in Table 2 (together with their potential advantages and disadvantages). Given the current situation, the recommendations of Doležel and Greilhuber are timely and appropriate and it is urged that they are adopted by all practitioners in the field of genome size research.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call