Abstract
SummaryMicrosatellites or simple sequence repeats (SSRs) play an important role in plant genetics and breeding. The development of microsatellites is a time-consuming and expensive process. In silico mining of microsatellites from expressed sequence tags (ESTs), which are available in electronic molecular databases, is a cheaper alternative. In the present study, we used a computational approach for mining SSRs from 20,159 ESTs in Allium cepa. These onion ESTs represented a total length of 13.2 Mb and were downloaded from the dbEST database of the National Center for Biotechnology Information (NCBI) and subjected to various pre-processing steps. The pre-processed ESTs were clustered, resulting in non-redundant unigenes. These unigenes were analysed for their SSR content and distribution. In all, 1,464 SSRs consisting of di-, tri-, tetra-, penta- and hexa-nucleotide repeats were mined from the non-redundant ESTs (contigs and singletons). Tri-nucleotide SSRs were the most abundant, followed by tetra-, di-, hexa- and penta-nucleotide SSRs. Among the tri-coding repeats, leucine and serine codons were more abundant. The SSR-containing sequences were annotated and grouped into their respective functional categories. The predominant functional group among the annotated unigenes was “metabolism”, followed by “transcription factors” and “transporter proteins”. Primer pairs could be designed for 1,092 SSR-containing sequences. Of these, 51 primer pairs were validated in the laboratory. A database has been developed to store the unigenes, primer pairs, putative annotations, and BLAST results. After validation, the EST-derived microsatellite (SSR) markers can be used in studies related to marker-assisted selection, detection of polymorphism, DNA fingerprinting, and diversity analysis in onion.
Published Version
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have