Abstract

Microbial pathogens have evolved numerous mechanisms to hijack host’s systems, thus causing disease. This is mediated by alterations in the combined host-pathogen proteome in time and space. Mass spectrometry-based proteomics approaches have been developed and tailored to map disease progression. The result is complex multidimensional data that pose numerous analytic challenges for downstream interpretation. However, a systematic review of approaches for the downstream analysis of such data has been lacking in the field. In this review, we detail the steps of a typical temporal and spatial analysis, including data pre-processing steps (i.e., quality control, data normalization, the imputation of missing values, and dimensionality reduction), different statistical and machine learning approaches, validation, interpretation, and the extraction of biological information from mass spectrometry data. We also discuss current best practices for these steps based on a collection of independent studies to guide users in selecting the most suitable strategies for their dataset and analysis objectives. Moreover, we also compiled the list of commonly used R software packages for each step of the analysis. These could be easily integrated into one’s analysis pipeline. Furthermore, we guide readers through various analysis steps by applying these workflows to mock and host-pathogen interaction data from public datasets. The workflows presented in this review will serve as an introduction for data analysis novices, while also helping established users update their data analysis pipelines. We conclude the review by discussing future directions and developments in temporal and spatial proteomics and data analysis approaches. Data analysis codes, prepared for this review are available from https://github.com/BabuLab-UofR/TempSpac, where guidelines and sample datasets are also offered for testing purposes.

Highlights

  • We present frameworks for the analysis of temporal and spatial proteomic data from host-pathogen interactions (HPI) studies, focusing on specific examples and robust methods adapted from statistics and machine learning

  • The results indicated that human cytomegalovirus infection resulted in rapid depletion of CD155 from the cell surface at the same time as the total amount of CD155 in the whole cell increased (Weekes et al, 2014)

  • Advances in sample preparation methods, mass spectrometry, as well as computational facilities and approaches allow producing and analyzing a plethora of proteomic HPI data to reveal changes occurring across space and time in response to an infection

Read more

Summary

INTRODUCTION

Intracellular pathogens, including viruses, bacteria (Auweter et al, 2011; Schweppe et al, 2015; Lopez et al, 2016), parasites, and fungi (Iyer et al, 2007; Gilbert et al, 2015; May and Casadevall, 2018; Eisenreich et al, 2019), cause numerous deaths and impose staggering healthcare costs (Kamaruzzaman et al, 2017). Spatiotemporal Proteomics Data Analysis Methods the host and the pathogen results in disease This interplay in host-pathogen interactions (HPI) is highly complex and dynamic. The typical output of a quantitative MS experiment that maps temporal and/or spatial changes during an infection includes highly complex, multi-dimensional data matrices with protein abundances across space or time represented by ion intensities or spectral counts, depending on the MS approach. Such data are challenging to analyze and interpret. Examples in this review focus on intracellular pathogens, the same pipelines can be used, e.g., in the analysis of genetic or environment-induced disease

References caret
Evaluation Measures for Temporal Clustering
DISCUSSION
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call