Abstract
BackgroundSimple Sequence Repeats (SSRs) are widely used in population genetic studies but their classical development is costly and time-consuming. The ever-increasing available DNA datasets generated by high-throughput techniques offer an inexpensive alternative for SSRs discovery. Expressed Sequence Tags (ESTs) have been widely used as SSR source for plants of economic relevance but their application to non-model species is still modest.MethodsHere, we explored the use of publicly available ESTs (GenBank at the National Center for Biotechnology Information-NCBI) for SSRs development in non-model plants, focusing on genera listed by the International Union for the Conservation of Nature (IUCN). We also search two model genera with fully annotated genomes for EST-SSRs, Arabidopsis and Oryza, and used them as controls for genome distribution analyses. Overall, we downloaded 16 031 555 sequences for 258 plant genera which were mined for SSRsand their primers with the help of QDD1. Genome distribution analyses in Oryza and Arabidopsis were done by blasting the sequences with SSR against the Oryza sativa and Arabidopsis thaliana reference genomes implemented in the Basal Local Alignment Tool (BLAST) of the NCBI website. Finally, we performed an empirical test to determine the performance of our EST-SSRs in a few individuals from four species of two eudicot genera, Trifolium and Centaurea.ResultsWe explored a total of 14 498 726 EST sequences from the dbEST database (NCBI) in 257 plant genera from the IUCN Red List. We identify a very large number (17 102) of ready-to-test EST-SSRs in most plant genera (193) at no cost. Overall, dinucleotide and trinucleotide repeats were the prevalent types but the abundance of the various types of repeat differed between taxonomic groups. Control genomes revealed that trinucleotide repeats were mostly located in coding regions while dinucleotide repeats were largely associated with untranslated regions. Our results from the empirical test revealed considerable amplification success and transferability between congenerics.ConclusionsThe present work represents the first large-scale study developing SSRs by utilizing publicly accessible EST databases in threatened plants. Here we provide a very large number of ready-to-test EST-SSR (17 102) for 193 genera. The cross-species transferability suggests that the number of possible target species would be large. Since trinucleotide repeats are abundant and mainly linked to exons they might be useful in evolutionary and conservation studies. Altogether, our study highly supports the use of EST databases as an extremely affordable and fast alternative for SSR developing in threatened plants.Electronic supplementary materialThe online version of this article (doi:10.1186/s12864-015-2031-1) contains supplementary material, which is available to authorized users.
Highlights
Simple Sequence Repeats (SSRs) are widely used in population genetic studies but their classical development is costly and time-consuming
Only sequences assigned to Oryza were downloaded from the dbEST, 23.8 % of the analyzed Expressed Sequence Tags (EST) sequences containing SSRs did not render a significant hit in the BLASTn search against the Oryza sativa reference genome
Our results highly support the use of existing EST databases for SSRs discovery in non-model plants as a bench tool for evolutionary and/or conservation studies of geneticists and molecular ecologists
Summary
Simple Sequence Repeats (SSRs) are widely used in population genetic studies but their classical development is costly and time-consuming. Expressed Sequence Tags (ESTs) have been widely used as SSR source for plants of economic relevance but their application to non-model species is still modest. The analysis of DNA variation is a key component in plant genetics studies addressing relevant aspects such as evolution, phylogeny or conservation [1,2,3]. Among the various types of molecular markers used for these purposes, Simple Sequence Repeats (SSRs) are often regarded as the markers of choice because of their abundance, multiallelic behavior, high polymorphism and codominant inheritance [4]. Genomic SSRs are usually species-specific, meaning that markers developed for one taxon are not always directly transferred to another [6]. The rates of successful cross-species transferability vary greatly between taxonomic groups [7]
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.