Abstract

BackgroundMicrosatellites, or Simple Sequence Repeats (SSRs), are short tandem repeats of 1–6 nt motifs present in all genomes. Emerging evidence points to their role in cellular processes and gene regulation. Despite the huge resource of genomic information currently available, SSRs have been studied in a limited context and compared across relatively few species.ResultsWe have identified ~ 685 million eukaryotic microsatellites and analyzed their genomic trends across 15 taxonomic subgroups from protists to mammals. The distribution of SSRs reveals taxon-specific variations in their exonic, intronic and intergenic densities. Our analysis reveals the differences among non-related species and novel patterns uniquely demarcating closely related species. We document several repeats common across subgroups as well as rare SSRs that are excluded almost throughout evolution. We further identify species-specific signatures in pathogens like Leishmania as well as in cereal crops, Drosophila, birds and primates. We also find that distinct SSRs preferentially exist as long repeating units in different subgroups; most unicellular organisms show no length preference for any SSR class, while many SSR motifs accumulate as long repeats in complex organisms, especially in mammals.ConclusionsWe present a comprehensive analysis of SSRs across taxa at an unprecedented scale. Our analysis indicates that the SSR composition of organisms with heterogeneous cell types is highly constrained, while simpler organisms such as protists, green algae and fungi show greater diversity in motif abundance, density and GC content. The microsatellite dataset generated in this work provides a large number of candidates for functional analysis and for studying their roles across the evolutionary landscape.

Highlights

  • Microsatellites, or Simple Sequence Repeats (SSRs), are short tandem repeats of 1–6 nt motifs present in all genomes

  • We identified a total of 684,885,656 perfect SSRs and analyzed their distribution patterns across organisms divided into 5 main groups constituting 15 subgroups (Additional file 1: Table S1)

  • In order to normalize their occurrence to the genome size we looked at the density of SSRs

Read more

Summary

Introduction

Microsatellites, or Simple Sequence Repeats (SSRs), are short tandem repeats of 1–6 nt motifs present in all genomes. Microsatellites, known as Simple Sequence Repeats or SSRs, are short tandem repeats of 1–6 nucleotide DNA motifs. They comprise a significant portion of the genome in complex organisms, often surpassing the proportion of coding sequences [1]. SSRs contribute to 3% of the human genome [2], and display a non-random distribution in many genomes [1, 3] They have high mutation rates due to polymerase slippage, with a bias towards elongation [4]. Though a Recent studies have focused on the role of SSRs in cellular processes such as the epigenetic regulation of gene expression [11,12,13] and genome organization [14]. A comprehensive analysis of these elements across the evolutionary landscape can help identify functionally relevant SSRs but in silico studies have mostly been limited by the efficiency, exhaustiveness and sensitivity of the various SSR identification programs they utilize and can be compromised by the quality of the SSR datasets generated [15, 16]

Methods
Results
Discussion
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call