Abstract
BackgroundMicrosatellites (repeated subsequences based on motifs of one to six nucleotides) are widely used as codominant genetic markers because of their frequent polymorphism and relative selective neutrality. Minisatellites are repeats of motifs having seven or more nucleotides. The large number of EST sequences now available in public databases offers an opportunity to compare microsatellite and minisatellite properties and evaluate their evolution over a broad range of plant taxa.ResultsRepeated motifs from one to 250 nucleotides long were identified in 6793306 expressed sequence tags (ESTs) from 88 genera of vascular plants, using a custom data-processing pipeline that allowed limited variation among repeats. The pipeline processed trimmed but otherwise unfiltered sequence and output nonredundant loci of at least 15 nucleotides, with degree of polymorphism and PCR primers wherever possible. Motifs that were an integral multiple of three in length were more abundant and richer in G/C than other motifs. From 80 to 85% of minisatellite motifs represented repeats within proteins, up to the 228-nucleotide repeat of ubiquitin, but not all of these repeats preserved reading frame. The remaining 15 to 20% of minisatellite motifs were associated with transcribed repetitive elements, e.g., retrotransposons. Relative microsatellite motif frequencies did not correlate tightly to phylogenetic relationship. Evolution of increased microsatellite and EST GC content was evident within the grasses. Microsatellites were less frequent in the transcriptome of genera with large genomes, but there was no evidence for greater dilution of the transcriptome with transposable element transcripts in these genera.ConclusionThe relatively low correlation of microsatellite spectrum to phylogeny suggests that repeat loci evolve more rapidly than the surrounding sequence, although tissue specificity of the different EST libraries is a complicating factor. In-frame motifs are more abundant and higher in GC than frame-shifting motifs, but most EST minisatellite loci appear to represent repeats in translated sequence, regardless of whether reading frame is preserved. Motifs of four to six nucleotides are as polymorphic in EST collections as the commonly used motifs of two and three nucleotides, and they can be exploited as genetic markers with little additional effort.
Highlights
Microsatellites are widely used as codominant genetic markers because of their frequent polymorphism and relative selective neutrality
The phrap step reduced the total number of nucleotides to 836433533 in the nonredundant set, for a 77.3% reduction of total sequence length
The analysis considered perfect microsatellites, and imperfect microsatellites with a 10% tolerance at minimum locus lengths of 15 and 20 nucleotides
Summary
Data source DNA sequences were extracted from gbest flat files in GenBank release 150, dated 15 October 2005 [41]. Sequences were counted by genus, and sequences were written to separate flat files for all 88 genera that were represented by at least 3000 sequences. Removal of vector and low-quality sequence Each genus was processed independently. Its sequence file was first searched for vector sequences by blastn [42] against UniVec [43]. A Perl 5.8 script removed the vector-matching subsequences, except for the telomeric repeats of YAC vectors. A second Perl script trimmed off tracts of at least 10 consecutive A's or T's within the first 60 nucleotides from either end of the vector-trimmed sequences, plus any sequence distal to these tracts
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.