Abstract

Some complications of the “overlapping” hypothesis of nucleic acidprotein coding are discussed, and a procedure outlined for elucidating such a code, using a minimum of assumptions concerning its detailed structure. Using only data from published protein sequences, a non-overlapping code would be unsolvable. An overlapping code would, however, impose constraints on adjacent linkages, such that not all mathematically possible sequences could be produced. There are a large number of ways in which the known sequence data can be organized so as to develop patterns which would be non-random if such constraints occur. Finally, if these results are positive, given enough original data, it should be possible to use such patterns to predict the sequences which are missing from the data, but consistent with it. From these, a detailed formulation of the code should eventually be possible. The first result of this approach shows that the reported “alleles” do not occur at random; certain pairs occur with greater than chance frequency. This may be interpreted as a “constraint” imposed by the context of the adjacent links.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call