Abstract
In recent years, it has been increasingly clear that long non-coding RNAs (lncRNAs) are able to regulate their target genes at multi-levels, including transcriptional level, translational level, etc and play key regulatory roles in many important biological processes, such as cell differentiation, chromatin remodeling and more. Inferring potential lncRNA-disease associations is essential to reveal the secrets behind diseases, develop novel drugs, and optimize personalized treatments. However, biological experiments to validate lncRNA-disease associations are very time-consuming and costly. Thus, it is critical to develop effective computational models. In this study, we have proposed a method by alternating least squares based on matrix factorization to predict lncRNA-disease associations, referred to as ALSBMF. ALSBMF first decomposes the known lncRNA-disease correlation matrix into two characteristic matrices, then defines the optimization function using disease semantic similarity, lncRNA functional similarity and known lncRNA-disease associations and solves two optimal feature matrices by least squares method. The two optimal feature matrices are finally multiplied to reconstruct the scoring matrix, filling the missing values of the original matrix to predict lncRNA-disease associations. Compared to existing methods, ALSBMF has the same advantages as BPLLDA. It does not require negative samples and can predict associations related to novel lncRNAs or novel diseases. In addition, this study performs leave-one-out cross-validation (LOOCV) and five-fold cross-validation to evaluate the prediction performance of ALSBMF. The AUCs are 0.9501 and 0.9215, respectively, which are better than the existing methods. Furthermore colon cancer, kidney cancer, and liver cancer are selected as case studies. The predicted top three colon cancer, kidney cancer, and liver cancer-related lncRNAs were validated in the latest LncRNADisease database and related literature. In order to test the ability of ALSBMF to predict novel disease-associated lncRNAs and new lncRNA-associated diseases, all known associations of diseases and lncRNAs were eliminated, the predicted top five breast cancer, nasopharyngeal carcinoma cancer-related lncRNAs and top five H19, MALAT1 lncRNA-related cancers were validated in PubMed and dbSNP.
Highlights
Sequence analysis of the human genome identified only 20,000 coding sequences that can be translated into proteins, The associate editor coordinating the review of this manuscript and approving it for publication was Kin Fong Lei .and the number of these coding sequences accounted for less than 2% of all human genomes(Paul et al, 2004)
The optimal feature matrix is multiplied by two feature matrices to reconstruct the scoring matrix, filling the missing values of the original matrix to predict the long non-coding RNAs (lncRNAs)-disease associations
Biological experiments have been the primary method for identifying lncRNA-disease associations
Summary
Sequence analysis of the human genome identified only 20,000 coding sequences that can be translated into proteins, The associate editor coordinating the review of this manuscript and approving it for publication was Kin Fong Lei. VOLUME 8, 2020 disadvantage of this method is that it requires information on negative samples, which is unknown in this field of study To solve this problem, Chen et al identified a candidate lncRNA-disease association by establishing a Laplacian regularized least squares method based on a semi-supervised learning framework(Chen and Yan, 2013). Sun et al(Sun et al, 2014) putted forward a computational mean called RWRlncD based on lncRNA-disease relation, lncRNA similarity and disease similarity This algorithm performs a restart random walk (RWR) in the functional similarity network of lncRNA to capture latent lncRNA-disease associations. GrwLDA has some drawbacks, such as how to choose the optimal parameters In both of the above parts, all computational models require a known lncRNA-disease association for the prediction. This method shows the excellent prediction performance in experimental results
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.