Abstract

Gene pairs that overlap in their coding regions are rare except in viruses. They may occur transiently in gene creation and are of biotechnological interest. We have examined the possibility to encode an arbitrary pair of protein domains as a dual gene, with the shorter coding sequence completely embedded in the longer one. For 500 × 500 domain pairs (X, Y), we computationally designed homologous pairs (X′, Y′) coded this way, using an algorithm that provably maximizes the sequence similarity between (X′, Y′) and (X, Y). Three schemes were considered, with X′ and Y′ coded on the same or complementary strands. For 16% of the pairs, an overlapping coding exists where the level of homology of X′, Y′ to the natural proteins represents an E-value of 10−10 or better. Thus, for an arbitrary domain pair, it is surprisingly easy to design homologous sequences that can be encoded as a fully-overlapping gene pair. The algorithm is general and was used to design 200 triple genes, with three proteins encoded by the same DNA segment. The ease of design suggests overlapping genes may have occurred frequently in evolution and could be readily used to compress or constrain artificial genomes.

Highlights

  • Overlapping gene pairs are found in many viruses and organisms[1,2,3,4,5]

  • To quantify in a general way the difficulty to create overlapping genes, we have examined the possibility of encoding an arbitrary pair of protein domains in a single DNA segment, as a fully-overlapping dual gene

  • We considered 500 protein domains from the Pfam database[18, 70–100] amino acids long, and all 125,250 corresponding domain pairs (X, Y). 44 of the proteins (9%) were viral proteins

Read more

Summary

Introduction

Overlapping gene pairs are found in many viruses and organisms[1,2,3,4,5]. In higher organisms, the overlapping regions usually involve introns, 5′- or 3′-untranslated regions, or very short protein coding segments. If the proteins are coded on the same strand in different reading frames, each nucleotide is part of two overlapping codons; if they are coded on opposite strands, the codons used for each protein must base pair These constraints affect the rate of genetic drift, limit the ability to become optimally adapted, and presumably explain the counterselection of such sequences. A special case of this hypothesis is that peptide ligands might have arisen for some proteins from the antisense strand complementary to their coding region[11,12]. In addition to their biological importance, overlapping genes could be of significant interest in biotechnology, to compress or constrain artificial genomes. An example of a designed overlapping gene was produced recently, where each DNA strand coded for a simplified but functional aminoacyl-tRNA synthetase “Urzyme”[13,14,15,16], with the two enzymes being homologous to the two modern synthetase classes

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call