Abstract

We consider the problem of designing DNA codes, namely sets of equi-length words over the alphabet [A, C, G, T] that satisfy certain combinatorial constraints. This problem is motivated by the task of reliably storing and retrieving information in synthetic DNA strands for use in DNA computing or as molecular bar codes in chemical libraries. The primary constraints that we consider, defined with respect to a parameter d, are as follows: for every pair of words w, x in a code, there are at least d mismatches between w and x if w not equal x and also between the reverse of w and the Watson-Crick complement of x. Extending classical results from coding theory, we present several upper and lower bounds on the maximum size of such DNA codes and give methods for constructing such codes. An additional constraint that is relevant to the design of DNA codes is that the free energies and enthalpies of the code words, and thus the melting temperatures, be similar. We describe dynamic programming algorithms that can (a) calculate the total number of words of length n whose free energy value, as approximated by a formula of Breslauer et al. (1986) falls in a given range, and (b) output a random such word. These algorithms are intended for use in heuristic algorithms for constructing DNA codes.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.