Abstract
Determining the subcellular localization of long non-coding RNAs (lncRNAs) provides very favorable references to discover the function of lncRNAs. Instead of through time-consuming and expensive biochemical experiments, we develop a machine learning predictor based on logistic regression, lncLocPred, to predict the subcellular localization of lncRNAs. We adopt sequence features including k-mer, triplet, and PseDNC and systematically process feature selection through VarianceThreshold, binomial distribution, and F-score to obtain representative features. We observe that the top-ranked k-mers have a higher base content of G and C in the form of short repeats. Improving prediction accuracy on several subcellular localizations, our model achieves the highest overall accuracy of 92.37% on the benchmark dataset by jackknife, higher than the existing state-of-the-art predictors. Additionally, lncLocPred performs better for the prediction on an independent dataset collected by us as well. Related experimental data and source code are available at https://github.com/jademyC1221/lncLocPred.
Highlights
Long non-coding RNAs with more than 200 nucleotides [1] have become a research hotspot, whose number reaches about 20000 estimated by ENCODE [2] or FANTOM5 [3]
We considered several sequence-derived features and proposed a logistic regression-based method to predict the subcellular localization of long non-coding RNAs (lncRNAs), named as lncLocPred
According to the definition of PseDNC, the value of λ should not exceed the difference between the sequence length and 2 (2 means dinucleotide)
Summary
Long non-coding RNAs (lncRNAs) with more than 200 nucleotides [1] have become a research hotspot, whose number reaches about 20000 estimated by ENCODE [2] or FANTOM5 [3]. They make a variety of important biological functions affecting different biological processes [4]–[6]. Alterations in the expression level of lncRNAs and up-regulation or down-regulation of a novel lncRNA have been shown to be the diagnostic marker for several types of diseases and cancers, which contributes to the proposal of therapeutic strategies [7]–[9]. LncRNAs located in the nucleus perform their regulatory functions including chromatin organization, transcriptional and post-transcriptional gene expression, and act as structural scaffolds of nucleus
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.