Protein O-glycosylation has long been recognized to be closely associated with many diseases, particularly with tumor proliferation, invasion, and metastasis. The ability to efficiently profile the variation of O-glycosylation in large-scale clinical samples provides an important approach for the development of biomarkers for cancer diagnosis and for therapeutic response evaluation. Therefore, mass spectrometry (MS)-based techniques for high throughput, in-depth and reliable elucidation of protein O-glycosylation in large clinical cohorts are in high demand. However, the wide existence of serine and threonine residues in the proteome and the tens of mammalian O-glycan types lead to extremely large searching space composed of millions of theoretical combinations of peptides and O-glycans for intact O-glycopeptide database searching. As a result, an exceptionally long time is required for database searching, which is a major obstacle in O-glycoproteome studies of large clinical cohorts. More importantly, because of the low abundance and poor ionization of intact O-glycopeptides and the stochastic nature of data-dependent MS2 acquisition, substantially elevated missing data levels are inevitable as the sample number increases, which undermines the quantitative comparison across samples. Therefore, we report a new MS data processing strategy that integrates glycoform-specific database searching, reference library-based MS1 feature matching and MS2 identification propagation for fast identification, in-depth, and reproducible label-free quantification of O-glycosylation of human urinary proteins. This strategy increases the database searching speeds by up to 20-fold and leads to a 30%-40% enhanced intact O-glycopeptide quantification in individual samples with an obviously improved reproducibility. In total, we identified 1300 intact O-glycopeptides in 36 healthy human urine samples with a 30%-40% reduction in the amount of missing data. This is currently the largest dataset of urinary O-glycoproteome and demonstrates the application potential of this new strategy in large-scale clinical investigations.
Read full abstract