Abstract
De novo creation of protein coding genes involves the formation of short ORFs from noncoding regions; some of these ORFs might then become fixed in the population. These orphan proteins need to, at the bare minimum, not cause serious harm to the organism, meaning that they should for instance not aggregate. Therefore, although the creation of short ORFs could be truly random, the fixation should be subjected to some selective pressure. The selective forces acting on orphan proteins have been elusive, and contradictory results have been reported. In Drosophila young proteins are more disordered than ancient ones, while the opposite trend is present in yeast. To the best of our knowledge no valid explanation for this difference has been proposed. To solve this riddle we studied structural properties and age of proteins in 187 eukaryotic organisms. We find that, with the exception of length, there are only small differences in the properties between proteins of different ages. However, when we take the GC content into account we noted that it could explain the opposite trends observed for orphans in yeast (low GC) and Drosophila (high GC). GC content is correlated with codons coding for disorder promoting amino acids. This leads us to propose that intrinsic disorder is not a strong determining factor for fixation of orphan proteins. Instead these proteins largely resemble random proteins given a particular GC level. During evolution the properties of a protein change faster than the GC level causing the relationship between disorder and GC to gradually weaken.
Highlights
Proteins without any detectable homology are often referred to as orphans
We show that the Guanine and Cytosine (GC) content of a genome is of great importance for the properties of an orphan protein
To identify the origin of the different properties of orphan and ancient proteins in different organisms we studied the distribution of different structural properties, including low complexity, fraction of transmembrane residues, secondary structure frequency and intrinsic disorder) for all genomes against GC of the genomes, see Fig 3
Summary
The presence of orphans can be attributed to several causes; rapid sequence divergence beyond the point of homology recognition [1, 2], lateral transfer of genetic material [3], and de novo gene creation [4]. The latter is of particular interest, as it is a source of completely novel coding material. It has later been shown that some of these proteins are not de novo created but rather assigned as orphans as a result of limited phylogenetic coverage in earlier studies [9]
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.