Abstract

Linguistic (word count) analysis of prokaryotic genome sequences, by Shannon N-gram extension, reveals that the dominant hidden motifs in A + T rich genomes are T(A)(T)A and G(A)(T)C with uncertain number of repeating A and T. Since prokaryotic sequences are largely protein-coding, the motifs would correspond to amphipathic alpha-helices with alternating lysine and phenylalanine as preferential polar and non-polar residues. The motifs are also known in eukaryotes, as nucleosome positioning patterns. Their existence in prokaryotes as well may serve for binding of histone-like proteins to DNA. In this case the above patterns in prokaryotes may be considered as “anticipated” nucleosome positioning patterns which, quite likely, existed in prokaryotic genomes before the evolutionary separation between eukaryotes and prokaryotes.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call