Abstract

The combined method of LC-MS/MS is increasingly being used to explore differences in the proteomic composition of complex biological systems. The reliability and utility of such comparative protein expression profiling studies is critically dependent on an accurate and rigorous assessment of quantitative changes in the relative abundance of the myriad of proteins typically present in a biological sample such as blood or tissue. In this review, we provide an overview of key statistical and computational issues relevant to bottom-up shotgun global proteomic analysis, with an emphasis on methods that can be applied to improve the dependability of biological inferences drawn from large proteomic datasets. Focusing on a start-to-finish approach, we address the following topics: 1) low-level data processing steps, such as formation of a data matrix, filtering, and baseline subtraction to minimize noise, 2) mid-level processing steps, such as data normalization, alignment in time, peak detection, peak quantification, peak matching, and error models, to facilitate profile comparisons; and, 3) high-level processing steps such as sample classification and biomarker discovery, and related topics such as significance testing, multiple testing, and choice of feature space. We report on approaches that have recently been developed for these steps, discussing their merits and limitations, and propose areas deserving of further research.

Highlights

  • The combined method of LC-MS/MS is increasingly being used to explore differences in the proteomic composition of complex biological systems

  • The field of expression proteomics seeks to answer the following questions: 1) which proteins and variant isoforms are expressed during the lifecycle of an organism; 2) which post-translational modifications occur in each of these proteins; 3) how do these patterns differ in different cell types and tissues and under different developmental, physiological, and disease conditions; and 4) how can biologists make use of this information to better understand the molecular basis for fundamental biological processes as well as for monitoring the course of disease so as to improve clinical diagnosis and treatment (1–3)

  • These questions are made all the more difficult by the complexity of most biological systems, which increases exponentially as one goes downstream from DNA sequence to mRNA intermediates to the protein end-products. While it appears there are likely far fewer genes coded for by the human genome than first anticipated, it is estimated that Ͼ60% of the ϳ25,000 putative ORFs encode more than one splice variant, and these in turn are frequently subject to post-translational modification (4, 5)

Read more

Summary

Peak detection and quantification

Classification algorithms and biomarker discovery interesting peptide patterns and to match new patterns to previously discovered ones. Rather than starting with peak detection and peptide identification to find disease biomarkers using LC-MS patterns, the data could be treated as a signal matrix, allowing the application of established methods in signal processing, statistics, and machine learning (13, 16, 17) to tease out interesting and relevant patterns from the data. The system of Radulovic et al (18) performs filtering, normalization, peak detection, quantification and alignment, and classification; they establish linearity in LC-MS peak signal with peptide concentration. Several studies of LC-MS have focused on particular low/ mid-level data processing steps such as noise reduction, error models (which model the variance of the peptide abundance), or alignment in time, while other studies have been devoted strictly to the use of MS signal for sample classification. No methods have been reported for evaluating optimal bin width, nor for determining the sensitivity of further calculations to this parameter

SIGNAL FILTERING AND BACKGROUND SUBTRACTION
DATASET ALIGNMENT AND COMPARISON
DATA NORMALIZATION
DATA TRANSFORMATIONS AND ERROR MODELS
STATISTICAL ISSUES
MACHINE LEARNING AS APPLIED TO MS
Objective
EVALUATING CURRENT METHODS
Findings
FUTURE PROSPECTS
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call