Abstract

In the past decade, hundreds of long noncoding RNAs (lncRNAs) have been identified as significant players in diverse types of cancer; however, the functions and mechanisms of most lncRNAs in cancer remain unclear. Several computational methods have been developed to detect associations between cancer and lncRNAs, yet those approaches have limitations in both sensitivity and specificity. With the goal of improving the prediction accuracy for associations of lncRNA with cancer, we upgraded our previously developed cancer-related lncRNA classifier, CRlncRC, to generate CRlncRC2. CRlncRC2 is an eXtreme Gradient Boosting (XGBoost) machine learning framework, including Synthetic Minority Over-sampling Technique (SMOTE)-based over-sampling, along with Laplacian Score-based feature selection. Ten-fold cross-validation showed that the AUC value of CRlncRC2 for identification of cancer-related lncRNAs is much higher than previously reported by CRlncRC and others. Compared with CRlncRC, the number of features used by CRlncRC2 dropped from 85 to 51. Finally, we identified 439 cancer-related lncRNA candidates using CRlncRC2. To evaluate the accuracy of the predictions, we first consulted the cancer-related long non-coding RNA database Lnc2Cancer v2.0 and relevant literature for supporting information, then conducted statistical analysis of somatic mutations, distance from cancer genes, and differential expression in tumor tissues, using various data sets. The results showed that our approach was highly reliable for identifying cancer-related lncRNA candidates. Notably, the highest ranked candidate, lncRNA AC074117.1, has not been reported previously; however, integrated multi-omics analyses demonstrate that it is the target of multiple cancer-related miRNAs and interacts with adjacent protein-coding genes, suggesting that it may act as a cancer-related competing endogenous RNA, which warrants further investigation. In conclusion, CRlncRC2 is an effective and accurate method for identification of cancer-related lncRNAs, and has potential to contribute to the functional annotation of lncRNAs and guide cancer therapy.

Highlights

  • Cancer is a leading cause of death worldwide (Siegel et al, 2018) and it is established that cancers are caused by genetic and epigenetic changes (Kanwal and Gupta, 2010; You and Jones, 2012)

  • The long non-coding RNA (lncRNA) growth arrest-specific transcript 5 (GAS5), which is down-regulated in almost all tumor tissues, can suppress the tumorigenesis of cervical cancer by downregulating miR196a and miR-205 (Yang et al, 2017), while LncRNA‐PVT1, which is up-regulated in non-small cell lung cancer (NSCLC), can improve tumor invasion and metastasis (Yang et al, 2014)

  • As the category of cancer unrelated lncRNA is difficult to define, and for consistency with other classifiers, we located a large number of single-nucleotide polymorphisms (SNPs) associated with phenotypes derived from the NHGRI-EBI GWAS Catalog (Welter et al, 2014) in the sequences of lncRNAs, and only those lncRNAs which had no phenotype-related SNPs detected within its 10 kb up/down stream were selected as cancer non-related lncRNAs

Read more

Summary

Introduction

Cancer is a leading cause of death worldwide (Siegel et al, 2018) and it is established that cancers are caused by genetic and epigenetic changes (Kanwal and Gupta, 2010; You and Jones, 2012). The detection of PCA3 in urine is a more specific marker for prostate cancer diagnosis than the commonly used factor, prostate specific antigen (PSA), and has been widely applied in the clinic (Hessels et al, 2003; Tinzl et al, 2004). Another example is lncRNA TUC339, which is highly enriched in extracellular vesicles secreted by hepatocellular carcinoma cells, where it regulates the growth and adhesion of tumor cells (Kogure et al, 2013). These features of lncRNA prompted us to search for efficient methods to predict functional lncRNAs in cancer, to facilitate deeper understanding

Methods
Results
Discussion
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.