Abstract

Cell surface proteins serve as primary drug targets and cell identity markers. The emergence of techniques like CITE-seq has enabled simultaneous quantification of surface protein abundance and transcript expression for multimodal data analysis within individual cells. The published data have been utilized to train machine learning models for predicting surface protein abundance based solely from transcript expression. However, the small scale of proteins predicted and the poor generalization ability for these computational approaches across diverse contexts, such as different tissues or disease states, impede their widespread adoption. Here we propose SPIDER (surface protein prediction using deep ensembles from single-cell RNA-seq), a context-agnostic zero-shot deep ensemble model, which enables the large-scale prediction of cell surface protein abundance and generalizes better to various contexts. Comprehensive benchmarking shows that SPIDER outperforms other state-of-the-art methods. Using the predicted surface abundance of >2500 proteins from single-cell transcriptomes, we demonstrate the broad applications of SPIDER including cell type annotation, biomarker/target identification, and cell-cell interaction analysis in hepatocellular carcinoma and colorectal cancer.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.