Abstract

Novel genes arising from random DNA sequences (de novo genes) have been suggested to be widespread in the genomes of different organisms. However, our knowledge about the origin and evolution of de novo genes is still limited. To systematically understand the general features of de novo genes, we established a robust pipeline to analyze >20,000 transcript-supported coding sequences (CDSs) from the budding yeast Saccharomyces cerevisiae. Our analysis pipeline combined phylogeny, synteny, and sequence alignment information to identify possible orthologs across 20 Saccharomycetaceae yeasts and discovered 4,340 S. cerevisiae-specific de novo genes and 8,871 S. sensu stricto-specific de novo genes. We further combine information on CDS positions and transcript structures to show that >65% of de novo genes arose from transcript isoforms of ancient genes, especially in the upstream and internal regions of ancient genes. Fourteen identified de novo genes with high transcript levels were chosen to verify their protein expressions. Ten of them, including eight transcript isoform-associated CDSs, showed translation signals and five proteins exhibited specific cytosolic localizations. Our results suggest that de novo genes frequently arise in the S. sensu stricto complex and have the potential to be quickly integrated into ancient cellular network.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call