In this paper we view the folding of polynucleotide (RNA) sequences as a map that assigns to each sequence a minimum-free-energy pattern of base pairings, known as secondary structure. Considering only the free energy leads to an energy landscape over the sequence space. Taking into account structure generates a less visualizable nonscalar ``landscape,'' where a sequence space is mapped into a space of discrete ``shapes.'' We investigate the statistical features of both types of landscapes by computing autocorrelation functions, as well as distributions of energy and structure distances, as a function of distance in sequence space. RNA folding is characterized by very short structure correlation lengths compared to the diameter of the sequence space. The correlation lengths depend strongly on the size and the pairing rules of the underlying nucleotide alphabet. Our data suggest that almost every minimum-free-energy structure is found within a small neighborhood of any random sequence. The interest in such landscapes results from the fact that they govern natural and artificial processes of optimization by mutation and selection. Simple statistical model landscapes, like Kauffman and Levin's n-k model [J. Theor. Biol. 128, 11 (1987)], are often used as a proxy for understanding realistic landscapes, like those induced by RNA folding. We make a detailed comparison between the energy landscapes derived from RNA folding and those obtained from the n-k model. We derive autocorrelation functions for several variants of the n-k model, and briefly summarize work on its fine structure. The comparison leads to an estimate for k=7--8, independent of n, where n is the chain length. While the scaling behaviors agree, the fine structure is considerably different in the two cases. The reason is seen to be the extremely high frequency of neutral neighbors, that is, neighbors with identical energy (or structure), in the RNA case.
Read full abstract