Abstract

Interactions between genetic factors and environmental factors (EFs) play an important role in many diseases. Many diseases result from the interaction between genetics and EFs. The long non-coding RNA (lncRNA) is an important non-coding RNA that regulates life processes. The ability to predict the associations between lncRNAs and EFs is of important practical significance. However, the recent methods for predicting lncRNA-EF associations rarely use the topological information of heterogenous biological networks or simply treat all objects as the same type without considering the different and subtle semantic meanings of various paths in the heterogeneous network. In order to address this issue, a method based on the Gradient Boosting Decision Tree (GBDT) to predict the association between lncRNAs and EFs (GBDTL2E) is proposed in this paper. The innovation of the GBDTL2E integrates the structural information and heterogenous networks, combines the Hetesim features and the diffusion features based on multi-feature fusion, and uses the machine learning algorithm GBDT to predict the association between lncRNAs and EFs based on heterogeneous networks. The experimental results demonstrate that the proposed algorithm achieves a high performance.

Highlights

  • The environment factor (EF) is a biological or non-biological factor that affects a living organism

  • We have proposed a high-performance method to predict the correlation between long non-coding RNA (lncRNA) and EFs based on heterogeneous networks

  • A method based on the Gradient Boosting Decision Tree (GBDT) to predict the association between LncRNA and EFs (GBDTL2E) has been proposed

Read more

Summary

INTRODUCTION

The environment factor (EF) is a biological or non-biological factor that affects a living organism. A new computational model, called heterogeneous graph convolutional network (HGCNMDA) (Li et al, 2019), was presented by Li et al, and another method, the double Laplace regularization (DLRMC) matrix completion model, is proposed by Tang et al (2019) Those studies have proven that the computational model could effectively predict the potential miRNA-disease associations and provide convenience for the verification experiment of biological researchers. The BRWLDA proposed by Yu et al is a method to predict the lncRNA-disease associations based on the double random walk of heterogeneous networks (Yu et al, 2017). Those existing methods to study the association between disease-related lncRNAs and EFs treat all objects as the same type without considering different subtle semantic meanings of different paths in the heterogeneous network This will reduce the accuracy and persuasiveness of the results.

MATERIALS AND METHODS
Calculate Gaussian Interaction Profile Kernel Similarity
Calculate Chemical Structure Similarity
Obtain the Similarity Matrix
Obtain Low-Dimensional Network Diffusion Features
Obtain the Diffusion Features Using RWR
Calculate the Hetesim Score
Train the Gradient-Boosting Decision Tree Classifier
D: Update mth weak model:
GBDTL2E Algorithm
Data Sets
Performance Measures
Method
Performance Comparison With Different Topological Features
Performance Comparison With Existing Methods
Case Study
Findings
DATA AVAILABILITY STATEMENT
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.