Abstract

Identifying molecular subtypes of colorectal cancer (CRC) may allow for more rational, patient-specific treatment. Various studies have identified molecular subtypes for CRC using gene expression data, but they are inconsistent and further research is necessary. From a methodological point of view, a progressive approach is needed to identify molecular subtypes in human colon cancer using gene expression data. We propose an approach to identify the molecular subtypes of colon cancer that integrates denoising by the Bayesian robust principal component analysis (BRPCA) algorithm, hierarchical clustering by the directed bubble hierarchical tree (DBHT) algorithm, and feature gene selection by an improved differential evolution based feature selection method (DEFSW) algorithm. In this approach, the normal samples being completely and exclusively clustered into one class is considered to be the standard of reasonable clustering subtypes, and the feature selection pays attention to imbalances of samples among subtypes. With this approach, we identified the molecular subtypes of colon cancer on the mRNA gene expression dataset of 153 colon cancer samples and 19 normal control samples of the Cancer Genome Atlas (TCGA) project. The colon cancer was clustered into 7 subtypes with 44 feature genes. Our approach could identify finer subtypes of colon cancer with fewer feature genes than the other two recent studies and exhibits a generic methodology that might be applied to identify the subtypes of other cancers.

Highlights

  • Identifying the molecular subtypes of colorectal cancer (CRC) may allow for more rational, patient-specific treatment in the future

  • Our study suggests that the methods based on differential evolution (DE) in [17,18] can achieve remarkably good results compared with other well-known feature selection methods

  • The microarray mRNA gene expression dataset we used to identify the subtypes of colon cancer is from the Cancer Genome Atlas (TCGA)

Read more

Summary

Introduction

Identifying the molecular subtypes of colorectal cancer (CRC) may allow for more rational, patient-specific treatment in the future. Various studies have been done to predict molecular subtypes for CRC based on gene expression data. Using consensus clustering based on self-organizing maps, nearest centroid classifier, and hierarchical clustering, Muzny et al showed that CRC has MSI/CIMP, CIN, and invasive subtypes with 1020 signature genes (340 genes per class) at the gene expression level [2]. Ren et al utilized consensus clustering based on K-means to identify the ECL1 and ECL2 subtypes of colon cancer and further classify the ECL1 into three subclasses [5]. These subtypes of CRC found in previous studies appear to be inconsistent, and further research is necessary. From a methodological point of view, a progressive approach is needed to identify the finer subtypes

Results
Discussion
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call