Abstract

Long non-coding RNAs (lncRNAs) play a broad spectrum of distinctive regulatory roles through interactions with proteins. However, only a few plant lncRNAs have been experimentally characterized. We propose GPLPI, a graph representation learning method, to predict plant lncRNA-protein interaction (LPI) from sequence and structural information. GPLPI employs a generative model using long short-term memory (LSTM) with graph attention. Evolutionary features are extracted using frequency chaos game representation (FCGR). Manifold regularization and l2-norm are adopted to obtain discriminant feature representations and mitigate overfitting. The model captures locality preserving and reconstruction constraints that lead to better generalization ability. Finally, potential interactions between lncRNAs and proteins are predicted by integrating catboost and regularized Logistic regression based on L-BFGS optimization algorithm. The method is trained and tested on Arabidopsis thaliana and Zea mays datasets. GPLPI achieves accuracies of 85.76% and 91.97% respectively. The results show that our method consistently outperforms other state-of-the-art methods.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call