Abstract
BackgroundIdentifying the interactions between proteins and long non-coding RNAs (lncRNAs) is of great importance to decipher the functional mechanisms of lncRNAs. However, current experimental techniques for detection of lncRNA-protein interactions are limited and inefficient. Many methods have been proposed to predict protein-lncRNA interactions, but few studies make use of the topological information of heterogenous biological networks associated with the lncRNAs.ResultsIn this work, we propose a novel approach, PLIPCOM, using two groups of network features to detect protein-lncRNA interactions. In particular, diffusion features and HeteSim features are extracted from protein-lncRNA heterogenous network, and then combined to build the prediction model using the Gradient Tree Boosting (GTB) algorithm. Our study highlights that the topological features of the heterogeneous network are crucial for predicting protein-lncRNA interactions. The cross-validation experiments on the benchmark dataset show that PLIPCOM method substantially outperformed previous state-of-the-art approaches in predicting protein-lncRNA interactions. We also prove the robustness of the proposed method on three unbalanced data sets. Moreover, our case studies demonstrate that our method is effective and reliable in predicting the interactions between lncRNAs and proteins.AvailabilityThe source code and supporting files are publicly available at: http://denglab.org/PLIPCOM/.
Highlights
Identifying the interactions between proteins and long non-coding RNAs is of great importance to decipher the functional mechanisms of lncRNAs
PLIPCOM incorporates (i) low dimensional diffusion features calculated using random walks with restart (RWR) and a dimension reduction approach (SVD), and (ii) HeteSim features obtained by computing the numbers of different paths from protein to lncRNA in the heterogeneous network
The improvements include: 1) We presented more detail of the methodology of PLIPCOM, such as the construction of protein-lncRNA heterogenous work, feature extraction and gradient tree boosting algorithm; 2) We have conducted extensive evaluation experiments to demonstrate the performance of the proposed method on multiple data sets with different positive and negative sample ratios, i.e. P:N=1:1,1:2,1:5,1:10, respectively
Summary
Long non-coding RNAs (lncRNAs) have been intensively investigated in recent years [1, 2], and show close connection to transcriptional regulation, RNA splicing, cell cycle and disease. The network propagation algorithms, such as the Katz measure [22], random walk with restart (RWR) [23], LPIHN [24] and PRINCE [25, 26], have been used to investigate the topological features of biomolecular networks in a variety of issues, such as disease-associated gene prioritization, drug repositioning and drug-target interaction prediction. PLIPCOM incorporates (i) low dimensional diffusion features calculated using random walks with restart (RWR) and a dimension reduction approach (SVD), and (ii) HeteSim features obtained by computing the numbers of different paths from protein to lncRNA in the heterogeneous network. The improvements include: 1) We presented more detail of the methodology of PLIPCOM, such as the construction of protein-lncRNA heterogenous work, feature extraction and gradient tree boosting algorithm; 2) We have conducted extensive evaluation experiments to demonstrate the performance of the proposed method on multiple data sets with different positive and negative sample ratios, i.e. P:N=1:1,1:2,1:5,1:10, respectively. We compared PLIPCOM with our previous method PLPIHS [27] on four independent test datasets, and the experimental results show that PLIPCOM significantly outperform our previous method; 3) To verify the effectiveness of the diffusion and HeteSim features in predicting proteinlncRNA interactions, we evaluated the predictive performance of the two types of features alone and combination of them, on the benchmark dataset; 4) Case studies have been described to show that our method is effective and reliable in predicting the interactions between lncRNAs and proteins; 5) Last but not the least, we have conducted the time complexity analysis of PLIPCOM
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.