Abstract

Over the last few years, there has been an increasing amount of evidence for the de novo emergence of protein-coding genes, i.e. out of non-coding DNA. Here, we review the current literature and summarize the state of the field. We focus specifically on open questions and challenges in the study of de novo protein-coding genes such as the identification and verification of de novo-emerged genes. The greatest obstacle to date is the lack of high-quality genomic data with very short divergence times which could help precisely pin down the location of origin of a de novo gene. We conclude that, while there is plenty of evidence from a genetics perspective, there is a lack of functional studies of bona fide de novo genes and almost no knowledge about protein structures and how they come about during the emergence of de novo protein-coding genes. We suggest that future studies should concentrate on the functional and structural characterization of de novo protein-coding genes as well as the detailed study of the emergence of functional de novo protein-coding genes.

Highlights

  • The question of how new genes come about has been a major research theme in evolutionary biology since the discovery that different species’ genomes contain varying numbers of genes

  • In recent years, an increasing number of studies confirmed a major role of de novo gene emergence in the evolution of new proteincoding genes

  • The functional description of de novo-emerged genes is still lacking, but more general findings for orphan genes suggest that novel genes have a broad functional potential

Read more

Summary

Introduction

The question of how new genes come about has been a major research theme in evolutionary biology since the discovery that different species’ genomes contain varying numbers of genes. If most confirmed de novo genes are folding, but most intergenic ORFs do not possess folding potential, folding potential would be a bottleneck of de novo protein-coding gene emergence and retention Another unsolved problem is how to find specific annotation thresholds for orphans/de novo genes[4]. Recent research has already shown that small ORFs (smORFs) can play a functional role[62,63], and it seems quite likely that very short novel ORFs could be functional This question touches upon the problem of differentiating lncRNAs from protein-coding genes, which is often performed via an ORF length cutoff[17,32]. Two closely related questions are how and when de novo proteins gain their function: are de novo genes usually functional from the time point of their emergence, or do they gain a cellular task only after a period of drift?

Conclusions
Ohno S
20. Schlötterer C
39. Abrusan G
52. Tompa P

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.