Abstract
Quantitative mass-spectrometry-based spatial proteomics involves elaborate, expensive, and time-consuming experimental procedures, and considerable effort is invested in the generation of such data. Multiple research groups have described a variety of approaches for establishing high-quality proteome-wide datasets. However, data analysis is as critical as data production for reliable and insightful biological interpretation, and no consistent and robust solutions have been offered to the community so far. Here, we introduce the requirements for rigorous spatial proteomics data analysis, as well as the statistical machine learning methodologies needed to address them, including supervised and semi-supervised machine learning, clustering, and novelty detection. We present freely available software solutions that implement innovative state-of-the-art analysis pipelines and illustrate the use of these tools through several case studies involving multiple organisms, experimental designs, mass spectrometry platforms, and quantitation techniques. We also propose sound analysis strategies for identifying dynamic changes in subcellular localization by comparing and contrasting data describing different biological conditions. We conclude by discussing future needs and developments in spatial proteomics data analysis.
Highlights
The knowledge of a protein’s subcellular localization is of paramount biological importance, and the reliable highthroughput assessment of localization, and as a consequence
We considered three sources of markers for use as input for the phenoDisco algorithm: (i) a highly and manually curated set from experts in the field (20 endoplasmic reticulum (ER), 6 Golgi, mitochondrial, and plasma membrane markers from our curated marker sets, originally obtained from Ref. 18); (ii) unique Gene Ontology (GO) cellular compartment (CC) annotations assigned a localization based on experimental evidence plus those assigned a unique localization as inferred from structural sequence or similarity in the GO database; and (iii) only unique GO CC annotations assigned a localization based on experimental evidence in the GO database
We have described a typical pipeline of organelle proteomics data and clarified some central machine learning concepts applied to such data
Summary
The knowledge of a protein’s subcellular localization is of paramount biological importance, and the reliable highthroughput assessment of localization, and as a consequence. Various experimental designs have been proposed, from those merely focused on the identification of proteins in single organelles through biochemical purification (pure fraction cataloging) to more complex methods that utilize quantitative mass spectrometry to elucidate the broad subcellular diversity of cells (fractionation-by-centrifugation approaches) Techniques employing the former that focus on single or a limited number of organelles suffer from two major drawbacks: they may give rise to misleading and/or erroneous associations without revealing a broader, biologically more meaningful picture, and they suffer from substantial contamination from incomplete purification/ enrichment. Dunkley et al [7] published the localization of organelle proteins by isotope tagging (LOPIT) technique, and Foster et al [8] described protein correlation profiling (PCP) using label-free quantitation These methods enable measurement of steadystate protein distributions to provide more realistic insight into their subcellular localization while overcoming the requirement to purify organelles of interest and discriminate between genuine organelle residents and contaminants. A matching profile permits the assignment of the protein to the specific marker organelle
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.