Abstract

In annotations of genome sequences a considerable fraction of putative proteins are left without sequence similarity to known proteins. These genes or open reading frames (ORFs) have been referred to “orphan”. Some portions of these putative proteins may have crucial organism-specific functions. On the contrary, it has been reported that some of the annotated genes in sequenced bacterial genomes are actually not protein-coding genes, but rather ORFs occurring by chance [5]. Genes without similarity to sequences in the public databases have been predicted by computer programs, which generally employ statistical measures of DNA sequences. For short ORFs, however, the discriminatory power of these measures is less reliable. Some groups re-annotated the genomes of particular organisms [1, 2, 3, 4] using different approaches. In the present study, we found a linear relationship between the proportions of orphan ORFs in the genomes and the evolutionary distances from the closest organisms in a dataset, and estimated the number of orphan genes in the genomes based on this correlation.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call