Abstract

Taxonomically restricted genes (TRGs) are genes that are present only in one clade. Protein-coding TRGs may evolve de novo from previously noncoding sequences: functional ncRNA, introns, or alternative reading frames of older protein-coding genes, or intergenic sequences. A major challenge in studying de novo genes is the need to avoid both false-positives (nonfunctional open reading frames and/or functional genes that did not arise de novo) and false-negatives. Here, we search conservatively for high-confidence TRGs as the most promising candidates for experimental studies, ensuring functionality through conservation across at least two species, and ensuring de novo status through examination of homologous noncoding sequences. Our pipeline also avoids ascertainment biases associated with preconceptions of how de novo genes are born. We identify one TRG family that evolved de novo in the Drosophila melanogaster subgroup. This TRG family contains single-copy genes in Drosophila simulans and Drosophila sechellia. It originated in an intron of a well-established gene, sharing that intron with another well-established gene upstream. These TRGs contain an intron that predates their open reading frame. These genes have not been previously reported as de novo originated, and to our knowledge, they are the best Drosophila candidates identified so far for experimental studies aimed at elucidating the properties of de novo genes.

Highlights

  • Some genes are present only in one clade, and are called taxonomically restricted genes (TRGs)

  • We look for taxonomically restricted gene families (TRGFs) that emerged after the split of the simulans-sechellia-melanogaster clade from the yakuba-erecta clade and before the speciation of D. simulans and D. sechellia (Figure 1)

  • The five species we study in the Drosophila melanogaster subgroup (D. melanogaster, D. simulans, D. sechellia, D. yakuba, and D. erecta) had a common ancestor ∼3.3 Mya (Obbard et al, 2012) (Figure 1)

Read more

Summary

Introduction

Some genes are present only in one clade, and are called taxonomically restricted genes (TRGs). They are referred to as orphans or novel genes. We use the origin version of the selected effect definition of function (Linquist et al, 2020) to determine when a sequence becomes a protein-coding gene. This means that de novo birth occurs at the moment beyond which a mutation leading to loss of the protein product would have a negative effect on fitness. Protein-coding genes may evolve de novo from non-coding regions (Vakirlis et al, 2017; McLysaght and Guerzoni, 2015), in alternative frames of established genes (Willis and Masel, 2018; Guan et al, 2018), or as a result of genome rearrangement (Chen et al, 2015; Stewart and Rogers, 2019)

Objectives
Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call