Abstract

Currently, there is a lack of bioinformatics approaches to identify highly divergent tandem repeats (TRs) in eukaryotic genomes. Here, we developed a new mathematical method to search for TRs, which uses a novel algorithm for constructing multiple alignments based on the generation of random position weight matrices (RPWMs), and applied it to detect TRs of 2 to 50 nucleotides long in the rice genome. The RPWM method could find highly divergent TRs in the presence of insertions or deletions. Comparison of the RPWM algorithm with the other methods of TR identification showed that RPWM could detect TRs in which the average number of base substitutions per nucleotide (x) was between 1.5 and 3.2, whereas T-REKS and TRF methods could not detect divergent TRs with x > 1.5. Applied to the search of TRs in the rice genome, the RPWM method revealed that TRs occupied 5% of the genome and that most of them were 2 and 3 bases long. Using RPWM, we also revealed the correlation of TRs with dispersed repeats and transposons, suggesting that some transposons originated from TRs. Thus, the novel RPWM algorithm is an effective tool to search for highly divergent TRs in the genomes.

Highlights

  • We have developed a new mathematical method to search for tandem repeats (TRs), which uses a new algorithm for constructing multiple alignments based on the generation of random position weight matrices (RPWMs) [32,33,34]

  • The results indicated that the RPWM algorithm could identify highly divergent

  • TRs with a total length of more than 20 nucleotides can be effectively detected using this procedure. This means that the number of TRs that can be detected by the RPWM is approximately 20/n, where n is the length of the period

Read more

Summary

Introduction

The rapid development of sequencing techniques in recent years has allowed determination of complete genome sequences for many eukaryote species [1]. A large amount of data on various types of nucleotide sequences has been accumulated, leading to challenges in the determination of their functional significance and evolutionary origin. Many computer methods have been developed for the functional annotation of various DNA sequences, including algorithms for the search of coding regions, promoters, transposons, and short and long interspersed nuclear elements (SINE and LINE, respectively) [2]. The identification of satellite DNA tandem repeats (TRs), including minisatellites and microsatellites, is a part of the genome annotation task. Mini- and microsatellites are short repeats of 6–100 and 2–5 bases long, respectively [3,4]. Microsatellites are the most prevalent and, the most studied; they are used as molecular markers, in particular, to assess the genetic diversity of agricultural plant and animal species [5]

Methods
Results
Discussion
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call