MultiPro: DDA-PASEF and diaPASEF acquired cell line proteomic datasets with deliberate batch effects

He Wang,Kai Peng Lim,Weijia Kong,Huanhuan Gao,Bertrand Jern Han Wong,Ser Xian Phua,Tiannan Guo,Wilson Wen Bin Goh

doi:10.1038/s41597-023-02779-8

He Wang, Kai Peng Lim + Show 6 more

Open Access

https://doi.org/10.1038/s41597-023-02779-8

Copy DOI

Abstract

Mass spectrometry-based proteomics plays a critical role in current biological and clinical research. Technical issues like data integration, missing value imputation, batch effect correction and the exploration of inter-connections amongst these technical issues, can produce errors but are not well studied. Although proteomic technologies have improved significantly in recent years, this alone cannot resolve these issues. What is needed are better algorithms and data processing knowledge. But to obtain these, we need appropriate proteomics datasets for exploration, investigation, and benchmarking. To meet this need, we developed MultiPro (Multi-purpose Proteome Resource), a resource comprising four comprehensive large-scale proteomics datasets with deliberate batch effects using the latest parallel accumulation-serial fragmentation in both Data-Dependent Acquisition (DDA) and Data Independent Acquisition (DIA) modes. Each dataset contains a balanced two-class design based on well-characterized and widely studied cell lines (A549 vs K562 or HCC1806 vs HS578T) with 48 or 36 biological and technical replicates altogether, allowing for investigation of a multitude of technical issues. These datasets allow for investigation of inter-connections between class and batch factors, or to develop approaches to compare and integrate data from DDA and DIA platforms.

Full Text