Investigating the proteome can add a significant layer of information to manifold existing methylation, mutation, and transcriptome data on brain tumors as proteins represent the pharmacologically addressable phenotype of a disease. Small cohorts limit the usability and validity of statistical methods, and variable technical setups and high numbers of missing values make data integration from public sources challenging. Using a newly developed framework being able to reduce batch effects without the need for data reduction or missing value imputation, we show –based on in-house and publicly available datasets- successful integration of proteomic data across different tissue types, quantification platforms, and technical setups. Exemplarily, data of a Sonic hedgehog (Shh) medulloblastoma mouse model were analyzed, showing efficient data integration independent of tissue preservation strategy or batch. We further integrated batches of publicly available data of human brain tumors, confirming proposed proteomic cancer subtypes correlating with clinical features. We show that, missing value tolerant reduction of technical variances may be helpful to identify biomarkers, proteomic signatures, and altered pathways characteristic for molecular brain cancer subtypes.
Read full abstract