Abstract

Our knowledge of lncRNA is very limited and discovering novel disease-related long non-coding RNA (lncRNA) has been a major research challenge in cancer studies. In this work, we developed an LncRNA Network-based Prioritization approach, named “LncNetP” based on the competing endogenous RNA (ceRNA) and disease phenotype association assumptions. Through application to 11 cancer types with 3089 common lncRNA and miRNA samples from the Cancer Genome Atlas (TCGA), our approach yielded an average area under the ROC curve (AUC) of 83.87%, with the highest AUC (95.22%) for renal cell carcinoma, by the leave-one-out cross validation strategy. Moreover, we demonstrated the excellent performance of our approach by evaluating the influencing factors including disease phenotype associations, known disease lncRNAs and the numbers of cancer types. Comparisons with previous methods further suggested the integrative importance of our approach. Taking hepatocellular carcinoma (LIHC) as a case study, we predicted four candidate lncRNA genes, RHPN1-AS1, AC007389.1, LINC01116 and BMS1P20 that may serve as novel disease risk factors for disease diagnosis and prognosis. In summary, our lncRNA prioritization strategy can efficiently identify disease-related lncRNAs and help researchers better understand the important roles of lncRNAs in human cancers.

Highlights

  • At least 90 % of the human genome is actively transcribed, while protein-coding gene only accounts for ~2% of the genome sequences

  • Systematic identification of long non-coding RNA (lncRNA) associations using the competing endogenous RNA (ceRNA) assumption For 11 cancer types, we obtained matched miRNA and lncRNA sequencing data that detected by IlluminaHiSeq miRNASeq and IlluminaHiSeq RNASeqV2 platforms, respectively, from the Cancer Genome Atlas (TCGA) database, including Bladder urothelial carcinoma (BLCA), Breast invasive carcinoma (BRCA), Cervical squamous cell carcinoma and endocervical adenocarcinoma (CESC), Kidney renal clear cell carcinoma (KIRC), Brain lower grade glioma (LGG), Liver hepatocellular carcinoma (LIHC), Lung adenocarcinoma (LUAD), Prostate adenocarcinoma (PRAD), Stomach adenocarcinoma (STAD), Thyroid carcinoma (THCA) and Uterine corpus endometrioid carcinoma (UCEC)

  • For the top 10% of lncRNAs in the candidate lncRNA lists of hepatocellular carcinoma (LIHC), breast cancer (BRCA) and prostate cancer (PRAD), we found 39.11% (24371 out of 62319), 41.76% (26368 out of 63118) and 45.37% (28656 out of 63167) miRNA-lncRNA pairs with the same biological functions, respectively (Benjamini-Hochberg correction, p ≤ 0.05, Supplementary Table 2)

Read more

Summary

Introduction

At least 90 % of the human genome is actively transcribed, while protein-coding gene only accounts for ~2% of the genome sequences. The rest of transcripts are non-coding RNAs including microRNAs (miRNAs) and long non-coding RNAs (lncRNAs) [1,2,3,4]. MiRNAs have been identified to play important roles in cancer initiation, progression and metastasis, some of which may serve as potential biomarkers for cancer diagnosis and prognosis [4]. Compared to miRNAs, lncRNAs, a class of non-protein coding transcripts that are longer than 200 nucleotides without protein-coding capacity, have been identified to regulate key cellular processes in carcinogenesis [1,2,3]. 12000 lncRNAs encoded in the human genome have been identified. Systematical studies revealed some “oncogenes” and “tumor suppressors” lncRNAs in cancer [5]. Despite much progress made by high-throughput biological techniques, the identification of cancer-related lncRNAs has remained a great challenge for researchers

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call