Abstract
It is recognized that a large proportion of eukaryotic RNAs and proteins is not produced from conventional genes but from short and alternative (alt) open reading frames (ORFs) that are not captured by gene prediction programs. Here we present an in silico prediction of altORFs by applying several selecting filters based on evolutionary conservation and annotations of previously characterized altORF peptides. Our work was performed in the Bithorax-complex (BX-C), which was one of the first genomic regions described to contain long non-coding RNAs in Drosophila. We showed that several altORFs could be predicted from coding and non-coding sequences of BX-C. In addition, the selected altORFs encode for proteins that contain several interesting molecular features, such as the presence of transmembrane helices or a general propensity to be rich in short interaction motifs. Of particular interest, one altORF encodes for a protein that contains a peptide sequence found in specific isoforms of two Drosophila Hox proteins. Our work thus suggests that several altORF proteins could be produced from a particular genomic region known for its critical role during Drosophila embryonic development. The molecular signatures of these altORF proteins further suggests that several of them could make numerous protein–protein interactions and be of functional importance in vivo.
Highlights
In contrast to the previous annotations of alternative ORFs” (altORFs), our prediction analysis was not restricted to ncRNAs and mRNAs
To assess whether the predicted altORFs could encode for peptides with potential molecular functions, we looked at several protein signatures
Our study revealed the presence of 48 altORFs with a high potential to encode for functional alternative proteins in BX-C
Summary
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. Classical gene prediction programs are based on automated intron–exon annotations and comparison with cDNA sequences and/or genes from different organisms [1,2,3,4]. These computational methods led to the general finding that a surprisingly small fraction of the eucaryotic sequenced genomes (2/3% on average) corresponds to protein-encoding open reading frames (ORFs, [5]). The small number of these so-called “conventional”
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.