Abstract
BackgroundCurrent genome sequencing projects reveal substantial numbers of taxonomically restricted, so called orphan genes that lack homology with genes from other evolutionary lineages. However, it is not clear to what extent orphan genes are real, genomic artifacts, or represent non-coding RNAs.ResultsHere, we use a simple set of assumptions to test the nature of orphan genes. First, a sequence that is transcribed is considered a real biological entity. Second, every sequence that is supported by proteome data or shows a depletion of non-synonymous substitutions is a protein-coding gene. Using genomic, transcriptomic and proteomic data for the nematode Pristionchus pacificus, we show that between 4129–7997 (42–81 %) of predicted orphan genes are expressed and 3818–7545 (39–76 %) of orphan genes are under negative selection. In three cases that exhibited strong evolutionary constraint but lacked expression evidence in 14 RNA-seq samples, we could experimentally validate the predicted gene structures. Comparing different data sets to infer selection on orphan gene clusters, we find that the presence of a closely related genome provides the most powerful resource to robustly identify evidence of negative selection. However, even in the absence of other genomic data, the availability of paralogous sequences was enough to show negative selection in 8–10 % of orphan genes.ConclusionsOur study shows that the great majority of previously identified orphan genes in P. pacificus are indeed protein-coding genes. Even though this work represents a case study on a single species, our approach can be transferred to genomic data of other non-model organisms in order to ascertain the protein-coding nature of orphan genes.
Highlights
Current genome sequencing projects reveal substantial numbers of taxonomically restricted, so called orphan genes that lack homology with genes from other evolutionary lineages
This indicates that a subset of orphan genes is differentially expressed under various conditions and implies that such an orphan gene must have persisted a sufficient amount of time in a genome for it to be integrated into a regulatory circuit
Our study investigates the nature of orphan genes, which lack homology in other evolutionary lineages
Summary
Current genome sequencing projects reveal substantial numbers of taxonomically restricted, so called orphan genes that lack homology with genes from other evolutionary lineages. Genes for which similar functions are described in one species (e.g. mouse) usually represent good candidate genes in other species of interest (e.g. human), still such strategy leads to an inherent bias in a way that only conserved genes with identifiable orthologs are pursued by functional follow up studies This practice bears the danger of ignoring a large number of genes that either do not have identifiable one-toone orthologs (e.g. 25 % of genes between human and mouse [4]) or completely lack any traces of homology in. Despite its relatively large size (160Mb), in terms of contiguity it is either superior or at least comparable (N50 = 1.2Mb) to many other recently published nematode genomes [16] It has proven of great use for several studies to identify and clone causative genes for various different phenotypes [14, 20, 24]. Various gene finding programs predicted 20,000-30,000 genes that have been evaluated several times using transcriptomic and comparative genomic data indicating that current gene annotations are largely complete [21, 23, 25]
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.