Classification of premalignant pancreatic cancer mass-spectrometry data using decision tree ensembles

Guangtao Ge,G William Wong

doi:10.1186/1471-2105-9-275

Guangtao Ge, G William Wong

Open Access

https://doi.org/10.1186/1471-2105-9-275

Copy DOI

Abstract

BackgroundPancreatic cancer is the fourth leading cause of cancer death in the United States. Consequently, identification of clinically relevant biomarkers for the early detection of this cancer type is urgently needed. In recent years, proteomics profiling techniques combined with various data analysis methods have been successfully used to gain critical insights into processes and mechanisms underlying pathologic conditions, particularly as they relate to cancer. However, the high dimensionality of proteomics data combined with their relatively small sample sizes poses a significant challenge to current data mining methodology where many of the standard methods cannot be applied directly. Here, we propose a novel methodological framework using machine learning method, in which decision tree based classifier ensembles coupled with feature selection methods, is applied to proteomics data generated from premalignant pancreatic cancer.ResultsThis study explores the utility of three different feature selection schemas (Student t test, Wilcoxon rank sum test and genetic algorithm) to reduce the high dimensionality of a pancreatic cancer proteomic dataset. Using the top features selected from each method, we compared the prediction performances of a single decision tree algorithm C4.5 with six different decision-tree based classifier ensembles (Random forest, Stacked generalization, Bagging, Adaboost, Logitboost and Multiboost). We show that ensemble classifiers always outperform single decision tree classifier in having greater accuracies and smaller prediction errors when applied to a pancreatic cancer proteomics dataset.ConclusionIn our cross validation framework, classifier ensembles generally have better classification accuracies compared to that of a single decision tree when applied to a pancreatic cancer proteomic dataset, thus suggesting its utility in future proteomics data analysis. Additionally, the use of feature selection method allows us to select biomarkers with potentially important roles in cancer development, therefore highlighting the validity of this method.

Highlights

Pancreatic cancer is the fourth leading cause of cancer death in the United States
We propose the use of a more accurate decision tree-based classifier ensembles combined with feature selection methods to address some of the challenges facing current cancer proteomics data analysis
Biological data sets generated from proteomics studies typically have a very high number of features compared to their small sample sizes

Summary

Introduction

Identification of clinically relevant biomarkers for the early detection of this cancer type is urgently needed. Proteomics profiling techniques combined with various data analysis methods have been successfully used to gain critical insights into processes and mechanisms underlying pathologic conditions, as they relate to cancer. In United States, there are ~30,000 new cases being diagnosed each year. 4% of the patients survive 5 years or more after being diagnosed. The grim statistics of pancreatic cancer necessitates the urgent development of methods to facilitate their early detection and (page number not for citation purposes). Despite the advancement of our knowledge in recent years regarding the pathophysiology of pancreatic cancer [2,3], we still lack an effective method to diagnose this cancer type early enough to impact the treatment outcomes

Methods

Results

Discussion

Conclusion

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: BMC Bioinformatics	Publication Date: Jun 11, 2008
Citations: 134	License type: CC BY 2.0

R Discovery Prime

R Discovery Prime

Classification of premalignant pancreatic cancer mass-spectrometry data using decision tree ensembles

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: BMC Bioinformatics

Lead the way for us

Similar Papers

Application of Feature Selection Methods and Ensembles on Network Security Dataset
Amir Ahmad ... Shilpi Bisht
International Journal of Computer Applications | VOL. 135
Amir Ahmad, et. al.Amir Ahmad ... Shilpi Bisht
17 Feb 2016
International Journal of Computer Applications | VOL. 135

Bioinformatics in proteomics: application, terminology, and pitfalls
Jan C Wiemer ... Alexander Prokudin
Pathology - Research and Practice | VOL. 200
Jan C Wiemer, et. al.Jan C Wiemer ... Alexander Prokudin
01 Apr 2004
Pathology - Research and Practice | VOL. 200

ForEx++: A New Framework for Knowledge Discovery from Decision Forests
Md Nasim Adnan ... Md Zahidul Islam
Australasian Journal of Information Systems | VOL. 21
Md Nasim Adnan, et. al.Md Nasim Adnan ... Md Zahidul Islam
08 Nov 2017
Australasian Journal of Information Systems | VOL. 21

Towards Optimization of Malware Detection using Chi-square Feature Selection on Ensemble Classifiers
*Fadare Oluwaseun Gbenga ... Adetunmbi Adebayo Olusola
International Journal of Engineering and Advanced Technology | VOL. 10
*Fadare Oluwaseun Gbenga, et. al.*Fadare Oluwaseun Gbenga ... Adetunmbi Adebayo Olusola
30 Apr 2021
International Journal of Engineering and Advanced Technology | VOL. 10

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Classification of premalignant pancreatic cancer mass-spectrometry data using decision tree ensembles

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: BMC Bioinformatics