Abstract
Simple sequence repeats (SSRs), also known as microsatellites, are ubiquitous short tandem duplications commonly found in genomes and/or transcriptomes of diverse organisms. They represent one of the most powerful molecular markers for genetic analysis and breeding programs because of their high mutation rate and neutral evolution. However, traditionally experimental screening of the SSR polymorphic status and their subsequent applicability to genetic studies are extremely labor-intensive and time-consuming. Thankfully, the recently decreased costs of next generation sequencing and increasing availability of large genome and/or transcriptome sequences have provided an excellent opportunity and sources for large-scale mining this type of molecular markers. However, current tools are limited. Thus we here developed a new pipeline, CandiSSR, to identify candidate polymorphic SSRs (PolySSRs) based on the multiple assembled sequences. The pipeline allows users to identify putative PolySSRs not only from the transcriptome datasets but also from multiple assembled genome sequences. In addition, two confidence metrics including standard deviation and missing rate of the SSR repetitions are provided to systematically assess the feasibility of the detected PolySSRs for subsequent application to genetic characterization. Meanwhile, primer pairs for each identified PolySSR are also automatically designed and further evaluated by the global sequence similarities of the primer-binding region, ensuring the successful rate of the marker development. Screening rice genomes with CandiSSR and subsequent experimental validation showed an accuracy rate of over 90%. Besides, the application of CandiSSR has successfully identified a large number of PolySSRs in the Arabidopsis genomes and Camellia transcriptomes. CandiSSR and the PolySSR marker sources are publicly available at: http://www.plantkingdomgdb.com/CandiSSR/index.html.
Highlights
Simple sequence repeats (SSRs; called microsatellites), containing repetitive sequences of 1–6 bp in length, have been extensively found in both the coding and non-coding sequences of eukaryotic and prokaryotic genomes (Tautz and Renz, 1984; Gupta et al, 1996; Li et al, 2002; Zhang et al, 2004)
The majority of expressed sequence tag (EST)-SSR loci are present in functional genes, indicating these markers could possibly be associated with some significant phenotypes
PolySSRs; and (12) design primer pairs and computationally assess the global similarity of primer binding regions for each PolySSR. All these steps are automatically implemented in one Perl script, CandiSSR.pl, the pipeline includes additional components implemented in Bash shell
Summary
Simple sequence repeats (SSRs; called microsatellites), containing repetitive sequences of 1–6 bp in length, have been extensively found in both the coding and non-coding sequences of eukaryotic and prokaryotic genomes (Tautz and Renz, 1984; Gupta et al, 1996; Li et al, 2002; Zhang et al, 2004) They are broadly applied in various areas of genetic studies including the evaluation of genetic variation (Kashi et al, 1997), construction of genetic linkage maps (Jones et al, 2002), QTL analysis (Mei et al, 2004; Minamiyama et al, 2007), positional cloning and molecular marker-assisted selection in plant and animal breeding programs (Mohan et al, 1997; Collard and Mackill, 2008). Due to a low efficiency of the traditional laboratory assessment for the SSR polymorphic status and their subsequent applicability to genetic studies, fewer available polymorphic SSRs (PolySSRs) are currently identified, which largely hampers the fairly urgent needs for efficient employment of the abundant SSR sources toward genetic studies and breeding efforts
Published Version (Free)
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have