DNA barcoding is now well established in animals based on sequences from a 648 base pair (bp) portion of the mito chondrial coding gene cytochrome oxidase 1 (COl) (Hebert & al., 2003). In contrast, it has proved more difficult to identify a suitable DNA barcoding locus (or loci) for land plants (reviewed by Hollingsworth, 2008). The generally low substitution rate of plant mitochondrial DNA (e.g., Fazekas & al., 2008) has led to investigations into the relative performance and information content of different loci from the plastid genome as alternative DNA barcodes for plants (e.g., Chase & al., 2007; Kress & Er ickson, 2007; Fazekas & al., 2008; Lahaye & al., 2008; CBOL Plant Working Group, 2009; Ford & al., 2009; Hollingsworth & al, 2009). At the third International Barcode of Life Conference in Mexico City (November 2009), the Consortium for the Barcode of Life (CBOL) announced that the standard core DNA bar code for land plants would consist of a two locus combination comprising portions of the protein-coding plastid genes rbcL and matK, to be supplemented with additional loci as required. In making this recommendation, two potential problems were recognized. Firstly, successful amplification and sequencing of the matK barcoding region can be difficult in some taxa with existing primers, and further primer and protocol development is required for this locus. Secondly, the rbcL+matK barcode will not lead to 100% species discrimination in many plant groups, and additional loci beyond this core-barcode will be needed to increase levels of species discrimination in these cases. In recognition of these problems, an 18-month review period on the performance of the rbcL + matK barcode has been established (completion due in mid-2011). During this review period, the executive committee of CBOL recommended that the plant barcoding community continue to collect data from other strongly performing candidate barcoding loci such as the non-coding plastid intergenic spacer trnH-psbA and the internal transcribed spacers (ITS) of nuclear ribosomal DNA. One concern that has been raised regarding the use of non coding plastid regions such as trnH-psbA as DNA barcodes is the presence of microsatellite repeats which can make it dif ficult to obtain bi-directional sequences in some samples (Faze kas & al, 2008; CBOL Plant Working Group, 2009; Devey & al, 2009). Verification of the sequenced strand through bidi rectional sequencing is desirable for the maintenance of data quality standards, as recommended by the CBOL Database Working Group for sequences destined to receive annotation with the reserved keyword 'Barcode' by the INSDC (DDBJ, EMBL & GenBank). However, PCR amplicons derived from regions containing microsatellites can produce poor qual ity sequence chromatograms. Slippage of the polymerase at microsatellite regions during PCR results in a 'stutter' effect observed in the chromatogram, with decreasing sequence qual ity associated with increasing repeat length. The reduction in sequence quality from regions with mononucleotide runs with less than ten repeats is generally moderate and ambigu ous base calls by the software are usually few, requiring little editing by hand. However, as the repeat number increases, the number of ambiguous bases increases disproportionately. This results in a longer amount of time required for editing (with a corresponding reduction in confidence of the true sequence) to the point where sequence data cannot be used at all past the repeat. The net effect is often a contig with overlapping data only at the repeat, and a shortened read length due to missing data at the ends (Fig. 1). In more extreme cases, where mul tiple mononucleotide repeats occur within a given amplicon, disruption of forward and reverse sequencing reads at differ ent mononucleotide runs can lead to only partial sequences of the region with no overlap, thus preventing the construction of a sequence contig. The problem caused by mononucleotide repeats in non-coding regions was noted by the CBOL Plant Working Group (2009) when evaluating the attributes of differ ent barcoding loci for plants, and this contributed towards the selection of an entirely coding core plant barcode. Fixing the 'mononucleotide repeat problem' was considered less tractable than the challenge of improving primer universality for matK. Recently, however, an evaluation of PCR methods and DNA polymerases, focusing on the reduction of the stutter ef fects resulting from mononucleotide repeats, has demonstrated that the use of particular polymerases can reduce the effect of slipped strand mispairing in PCR (Fazekas & al., 2010). This recent study tested the performance of different PCR profiles, reaction chemistries and polymerases on sequences from the trnH-psbA spacer from 25 plant samples. These samples were specifically selected because they contain mononucleotide re peats which had previously resulted in ambiguous base calls and low-quality sequence traces with conventional PCR and
Read full abstract