On an ensemble algorithm for clustering cancer patient data.

Ran Qi,Arnold Schwartz,Kai Xing,Eric Xu,Dechang Chen,Li Sheng,Donald Henson,Dengyuan Wu

doi:10.1186/1752-0509-7-s4-s9

Abstract

BackgroundThe TNM staging system is based on three anatomic prognostic factors: Tumor, Lymph Node and Metastasis. However, cancer is no longer considered an anatomic disease. Therefore, the TNM should be expanded to accommodate new prognostic factors in order to increase the accuracy of estimating cancer patient outcome. The ensemble algorithm for clustering cancer data (EACCD) by Chen et al. reflects an effort to expand the TNM without changing its basic definitions. Though results on using EACCD have been reported, there has been no study on the analysis of the algorithm. In this report, we examine various aspects of EACCD using a large breast cancer patient dataset. We compared the output of EACCD with the corresponding survival curves, investigated the effect of different settings in EACCD, and compared EACCD with alternative clustering approaches.ResultsUsing the basic T and N definitions, EACCD generated a dendrogram that shows a graphic relationship among the survival curves of the breast cancer patients. The dendrograms from EACCD are robust for large values of m (the number of runs in the learning step). When m is large, the dendrograms depend on the linkage functions.The statistical tests, however, employed in the learning step have minimal effect on the dendrogram for large m. In addition, if omitting the step for learning dissimilarity in EACCD, the resulting approaches can have a degraded performance. Furthermore, clustering only based on prognostic factors could generate misleading dendrograms, and direct use of partitioning techniques could lead to misleading assignments to clusters.ConclusionsWhen only the Partitioning Around Medoids (PAM) algorithm is involved in the step of learning dissimilarity, large values of m are required to obtain robust dendrograms, and for a large m EACCD can effectively cluster cancer patient data.

Highlights

The TNM staging system is based on three anatomic prognostic factors: Tumor, Lymph Node and Metastasis
We examined the effect of different settings in ensemble algorithm for clustering cancer data (EACCD) on the dendrogram generated by the algorithm
An application study EACCD, when applied to the breast cancer data, generated a dendrogram (Figure 1(a)) that exhibits one relationship among 12 survival curves corresponding to the 12 combinations

Summary

Results

Using the basic T and N definitions, EACCD generated a dendrogram that shows a graphic relationship among the survival curves of the breast cancer patients. The dendrograms from EACCD are robust for large values of m (the number of runs in the learning step). The dendrograms depend on the linkage functions. The statistical tests, employed in the learning step have minimal effect on the dendrogram for large m. If omitting the step for learning dissimilarity in EACCD, the resulting approaches can have a degraded performance. Clustering only based on prognostic factors could generate misleading dendrograms, and direct use of partitioning techniques could lead to misleading assignments to clusters

Background

Method

Results and discussion

Conclusion

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: BMC systems biology	Publication Date: Oct 1, 2013
Citations: 22	License type: cc-by

R Discovery Prime

R Discovery Prime

On an ensemble algorithm for clustering cancer patient data.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: BMC systems biology

Lead the way for us

Similar Papers

What are the problems with the current staging of discontinuous tumour nodules (DTNs) in colorectal carcinoma? Is there a better way?
Ella Karbanowicz ... Anthony J Gill
Pathology | VOL. 54
Ella Karbanowicz, et. al.Ella Karbanowicz ... Anthony J Gill
24 Sep 2022
Pathology | VOL. 54

Development of individual survival estimating program for cancer patients' management.
Myung-Chul Chang
Healthcare informatics research | VOL. 21
Myung-Chul ChangMyung-Chul Chang
01 Jan 2015
Healthcare informatics research | VOL. 21

Specific Detection of Cytokeratin 20-Positive Cells in Blood of Colorectal and Breast Cancer Patients by a High Sensitivity Real-Time Reverse Transcriptase-Polymerase Chain Reaction Method
Giuliana Giribaldi ... Francesco Turrini
The Journal of Molecular Diagnostics | VOL. 8
Giuliana Giribaldi, et. al.Giuliana Giribaldi ... Francesco Turrini
01 Feb 2006
The Journal of Molecular Diagnostics | VOL. 8

Correlation of miR-195 with invasiveness and prognosis of breast cancer
Zhi-Min Shao ... Xue-Ying Wu
Chinese journal of surgery | VOL. 50
Zhi-Min Shao, et. al.Zhi-Min Shao ... Xue-Ying Wu
01 Apr 2012
Chinese journal of surgery | VOL. 50

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

On an ensemble algorithm for clustering cancer patient data.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: BMC systems biology