Abstract

BackgroundLiver Hepatocellular Carcinoma (LIHC) is one of the major cancers worldwide, responsible for millions of premature deaths every year. Prediction of clinical staging is vital to implement optimal therapeutic strategy and prognostic prediction in cancer patients. However, to date, no method has been developed for predicting the stage of LIHC from the genomic profile of samples.MethodsThe Cancer Genome Atlas (TCGA) dataset of 173 early stage (stage-I), 177 late stage (stage-II, Stage-III and stage-IV) and 50 adjacent normal tissue samples for 60,483 RNA transcripts and 485,577 methylation CpG sites, was extensively analyzed to identify the key transcriptomic expression and methylation-based features using different feature selection techniques. Further, different classification models were developed based on selected key features to categorize different classes of samples implementing different machine learning algorithms.ResultsIn the current study, in silico models have been developed for classifying LIHC patients in the early vs. late stage and cancerous vs. normal samples using RNA expression and DNA methylation data. TCGA datasets were extensively analyzed to identify differentially expressed RNA transcripts and methylated CpG sites that can discriminate early vs. late stages and cancer vs. normal samples of LIHC with high precision. Naive Bayes model developed using 51 features that combine 21 CpG methylation sites and 30 RNA transcripts achieved maximum MCC (Matthew’s correlation coefficient) 0.58 with an accuracy of 78.87% on the validation dataset in discrimination of early and late stage. Additionally, the prediction models developed based on 5 RNA transcripts and 5 CpG sites classify LIHC and normal samples with an accuracy of 96–98% and AUC (Area Under the Receiver Operating Characteristic curve) 0.99. Besides, multiclass models also developed for classifying samples in the normal, early and late stage of cancer and achieved an accuracy of 76.54% and AUC of 0.86.ConclusionOur study reveals stage prediction of LIHC samples with high accuracy based on the genomics and epigenomics profiling is a challenging task in comparison to the classification of cancerous and normal samples. Comprehensive analysis, differentially expressed RNA transcripts, methylated CpG sites in LIHC samples and prediction models are available from CancerLSP (http://webs.iiitd.edu.in/raghava/cancerlsp/).

Highlights

  • Liver Hepatocellular Carcinoma (LIHC) or Hepatocellular Carcinoma (HCC) is the fifth most common cancer and considered as the second major cause of cancer-related mortality with nearly 7,88,000 deaths occurring worldwide in the year 2015 [1]

  • The Cancer Genome Atlas (TCGA) datasets were extensively analyzed to identify differentially expressed RNA transcripts and methylated CpG sites that can discriminate early vs. late stages and cancer vs. normal samples of LIHC with high precision

  • Naive Bayes model developed using 51 features that combine 21 CpG methylation sites and 30 RNA transcripts achieved maximum MCC (Matthew’s correlation coefficient) 0.58 with an accuracy of 78.87% on the validation dataset in discrimination of early and late stage

Read more

Summary

Introduction

Liver Hepatocellular Carcinoma (LIHC) or Hepatocellular Carcinoma (HCC) is the fifth most common cancer and considered as the second major cause of cancer-related mortality with nearly 7,88,000 deaths occurring worldwide in the year 2015 [1]. Further in the United States, there is an estimation of approximately 31,780 deaths and 42,030 new cases in 2018. It is nearly two times more frequent in males than in females. A higher number of LIHC cases is reported in Africa and Asia than in Europe [2]. These observations indicate that many factors like viral hepatitis infection (hepatitis B or C) or cirrhosis, smoking, alcohol and lifestyle, etc. Liver Hepatocellular Carcinoma (LIHC) is one of the major cancers worldwide, responsible for millions of premature deaths every year. To date, no method has been developed for predicting the stage of LIHC from the genomic profile of samples

Objectives
Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call