Abstract

Long non-coding RNAs (lncRNAs) are generally expressed in a tissue-specific way, and subcellular localizations of lncRNAs depend on the tissues or cell lines that they are expressed. Previous computational methods for predicting subcellular localizations of lncRNAs do not take this characteristic into account, they train a unified machine learning model for pooled lncRNAs from all available cell lines. It is of importance to develop a cell-line-specific computational method to predict lncRNA locations in different cell lines. In this study, we present an updated cell-line-specific predictor lncLocator 2.0, which trains an end-to-end deep model per cell line, for predicting lncRNA subcellular localization from sequences. We first construct benchmark datasets of lncRNA subcellular localizations for 15 cell lines. Then we learn word embeddings using natural language models, and these learned embeddings are fed into convolutional neural network, long short-term memory and multilayer perceptron to classify subcellular localizations. lncLocator 2.0 achieves varying effectiveness for different cell lines and demonstrates the necessity of training cell-line-specific models. Furthermore, we adopt Integrated Gradients to explain the proposed model in lncLocator 2.0, and find some potential patterns that determine the subcellular localizations of lncRNAs, suggesting that the subcellular localization of lncRNAs is linked to some specific nucleotides. The lncLocator 2.0 is available at www.csbio.sjtu.edu.cn/bioinf/lncLocator2 and the source code can be found at https://github.com/Yang-J-LIN/lncLocator2.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.