Abstract

Biological macromolecules such as DNA, RNA, and proteins can be regarded as finite sequences of symbols (or words) over a finite alphabet. In this paper, we refer to DNA (RNA) sequences which are words on a four-letter alphabet. A comparison is made between some “genes”, or fragments of them, with random sequences or random reshuffled sequences on the same alphabet and having the same length. Some combinatorial techniques of analysis of finite words are developed. A crucial role in the comparison is played by the so-called special factors of a given word. In all the analysed DNA (RNA) fragments the distribution on the length of the number of right (left) special factors differs, in a very typical way, from the corresponding distribution in a string on the same alphabet and having the same length generated by a random source or obtained by making a random alteration (=shuffling) of the original string. This kind of change is irrespective of the length in the range that we have considered <2650 bp and of the phylogenetic origin of the fragment.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call