Classification-based comparison of pre-processing methods for interpretation of mass spectrometry generated clinical datasets

Wouter Wegdam,Marrije R Buist,Emiel Van Themaat,Johannes Mfg Aerts,Perry D Moerland,Boris Bleijlevens,Huub Cj Hoefsloot,Chris G De Koster

doi:10.1186/1477-5956-7-19

Wouter Wegdam, Marrije R Buist + Show 6 more

Open Access

https://doi.org/10.1186/1477-5956-7-19

Copy DOI

Abstract

BackgroundMass spectrometry is increasingly being used to discover proteins or protein profiles associated with disease. Experimental design of mass-spectrometry studies has come under close scrutiny and the importance of strict protocols for sample collection is now understood. However, the question of how best to process the large quantities of data generated is still unanswered. Main challenges for the analysis are the choice of proper pre-processing and classification methods. While these two issues have been investigated in isolation, we propose to use the classification of patient samples as a clinically relevant benchmark for the evaluation of pre-processing methods.ResultsTwo in-house generated clinical SELDI-TOF MS datasets are used in this study as an example of high throughput mass-spectrometry data. We perform a systematic comparison of two commonly used pre-processing methods as implemented in Ciphergen ProteinChip Software and in the Cromwell package. With respect to reproducibility, Ciphergen and Cromwell pre-processing are largely comparable. We find that the overlap between peaks detected by either Ciphergen ProteinChip Software or Cromwell is large. This is especially the case for the more stringent peak detection settings. Moreover, similarity of the estimated intensities between matched peaks is high.We evaluate the pre-processing methods using five different classification methods. Classification is done in a double cross-validation protocol using repeated random sampling to obtain an unbiased estimate of classification accuracy. No pre-processing method significantly outperforms the other for all peak detection settings evaluated.ConclusionWe use classification of patient samples as a clinically relevant benchmark for the evaluation of pre-processing methods. Both pre-processing methods lead to similar classification results on an ovarian cancer and a Gaucher disease dataset. However, the settings for pre-processing parameters lead to large differences in classification accuracy and are therefore of crucial importance. We advocate the evaluation over a range of parameter settings when comparing pre-processing methods. Our analysis also demonstrates that reliable classification results can be obtained with a combination of strict sample handling and a well-defined classification protocol on clinical samples.

Highlights

Mass spectrometry is increasingly being used to discover proteins or protein profiles associated with disease
We use classification of patient samples as a clinically relevant benchmark for the evaluation of preprocessing methods. Both pre-processing methods lead to similar classification results on an ovarian cancer and a Gaucher disease dataset
Combining all of these protocols, ranging from sample collection via pre-processing to classification, we aimed to develop the optimal strategy for analyzing complex mass spectrometry generated datasets such as surface-enhanced laser desorption/ionization (SELDI)-time of flight (TOF) MS datasets

Summary

Introduction

Mass spectrometry is increasingly being used to discover proteins or protein profiles associated with disease. With the use of mass spectrometry techniques such as MALDI-TOF and SELDI-TOF, it has become possible to analyse complex protein mixtures as found in serum relatively quickly. This has led to the discovery of a large number of proteins and protein profiles associated with various types of diseases [1,2,3,4]. After promising initial reports important questions have been raised about the reproducibility and reliability of the technique [5] Reasons for these shortcomings range from pre-analytical effects like sample storage and number of freeze-thaw cycles [6] to the analytical problems of bias due to overfitting and lack of external validation. One of these efforts towards standardization of preanalytical variables is being undertaken by the Specimen Collection and Handling Committee of the HUPO Plasma Proteome Project [10]

Objectives

Methods

Results

Discussion

Conclusion

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Proteome Science	Publication Date: Jan 1, 2009
Citations: 36	License type: cc-by

R Discovery Prime

R Discovery Prime

Classification-based comparison of pre-processing methods for interpretation of mass spectrometry generated clinical datasets

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Proteome Science

Lead the way for us

Similar Papers

Statistical and Computational Methods for Comparative Proteomic Profiling Using Liquid Chromatography-Tandem Mass Spectrometry
Jennifer Listgarten ... Andrew Emili
Molecular & Cellular Proteomics | VOL. 4
Jennifer Listgarten, et. al.Jennifer Listgarten ... Andrew Emili
01 Mar 2005
Molecular & Cellular Proteomics | VOL. 4

Interpreting deep convolutional neural network classification results indirectly through the preprocessing feature fusion method in ship image classification
Bo Wang ... Chengeng Huang
Journal of Applied Remote Sensing | VOL. 14
Bo Wang, et. al.Bo Wang ... Chengeng Huang
14 Feb 2020
Journal of Applied Remote Sensing | VOL. 14

Identification of Amaranthus Species Using Visible-Near-Infrared (Vis-NIR) Spectroscopy and Machine Learning Methods
Soo-In Sohn ... Youn-Sung Cho
Remote Sensing | VOL. 13
Soo-In Sohn, et. al.Soo-In Sohn ... Youn-Sung Cho
16 Oct 2021
Remote Sensing | VOL. 13

Highly Efficient Classification and Identification of Human Pathogenic Bacteria by MALDI-TOF MS
Sen-Yung Hsieh ... Jen-Kun Chen
Molecular & Cellular Proteomics | VOL. 7
Sen-Yung Hsieh, et. al.Sen-Yung Hsieh ... Jen-Kun Chen
01 Feb 2008
Molecular & Cellular Proteomics | VOL. 7

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Classification-based comparison of pre-processing methods for interpretation of mass spectrometry generated clinical datasets

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Proteome Science