Abstract

Long non-coding RNAs (LncRNAs) are non-protein coding transcripts more than 200 nucleotides in length. Deep sequencing technologies have unveiled lncRNAs can harbor translatable short open reading frames (sORFs). Yet the regulatory mechanisms governing lncRNA translation events remain poorly understood. Here, we exhaustively detected the sequence, functional element, and structure features relevant to lncRNA translation in human. Extensive identification and analysis reveal that translatable lncRNAs contain richer protein-coding related sequence features, cap-dependent and cap-independent translation initiation mechanisms, and more stable secondary structures, as compared to untranslatable lncRNAs. These findings strongly support lncRNAs serve as a repository for the production of new small peptides. Based on the feature fusion affecting translation and the extreme gradient boosting (XGBoost) algorithm, we developed the first computational tool that dedicated for predicting translatable lncRNAs, named TransLncPred. Benchmark experimental results show that our method outperforms several state-of-the-art RNA coding potential prediction tools on the same training and testing datasets. The 100-time 10-fold cross-validation tests also demonstrate that regulatory element-derived features, especially N7-methylguanosine (m7G) and internal ribosome entry site (IRES), contribute to the improvement in predictive performance.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.