Abstract The high tumor heterogeneity of high grade serous ovarian cancer (HGSOC) both intra-tumoral and inter-patient is one of the main factors hampering the identification of novel therapeutic treatment to improve survival of HGSOC patients. Single cell technologies are affording new inroads for the deep molecular characterization of such tumor heterogeneity, in both primary and metastatic samples. To this aim, we built a manually curated HGSOC transcriptomics atlas (~ 1,5 million cells, 79 patients) based on publicly available datasets and developed a data integration strategy specifically applicable to very heterogenous single cell data. scRNA-seq data integration is a central methodological challenge related to single cell data analysis, particularly in the context of highly heterogenous datasets. In this context, the main source of variability is the one existing among patients. Previously developed integration methods tend to perform badly on cancer data since they mainly rely on the assumption that cell populations present in different samples are similar, which might not apply for unknown, rare or patient-specific cell populations. Here we show a novel integration approach for patient-derived cancer data. To obtain a more faithful representation of the data, we performed identification of cell populations and metacells derivation separately for every sample to identify the distinct subpopulations characterizing each patient. In addition, to robustly represent the space describing the three main cell populations of the dataset, i.e., tumoral, immune and stromal, for each of the latter we selected the space identified by the union of highly variable genes, computed separately on every dataset, to preserve the variability of the system. Next, we fed a variational autoencoder with metacells data to integrate and characterize the different subpopulations that may be similar across samples. After integration, we were able to identify new robust gene signatures that would have been masked by datasets heterogeneity. The high degree of accuracy in the manual curation of metadata harmonization in the HGSOC atlas and the robustness and scalability of the data integration pipeline allowed us to obtain an in-depth characterization of the tumor subpopulations involved in therapy response by characterizing the chemotherapy induced transcriptional features. At the same time, thanks to the atlas-associated cell types labelling strategy, this atlas represents a transformative resource for the community by enabling the investigation of the interaction between tissue microenvironment (TME) cells and tumoral cells in shaping the metastatic process. This innovative HGSOC atlas will provide the scientific community of ovarian cancer research with a very powerful tool to investigate many aspects of this disease. The robust deep learning-based framework allows to expand the atlas easily and iteratively with newly generated datasets (out of sample extension), will constitute an invaluable resource for the HGSOC scientific field tackling HGSOC pathogenetic mechanism. Citation Format: Marta R. Sallese, Pietro Lo Riso, Carlo Emanuele Villa, Giuseppe Testa. An ovarian cancer scRNA-seq atlas to dissect tumor-host interactions underlying metastatization and chemoresistance [abstract]. In: Proceedings of the AACR Special Conference on Ovarian Cancer; 2023 Oct 5-7; Boston, Massachusetts. Philadelphia (PA): AACR; Cancer Res 2024;84(5 Suppl_2):Abstract nr A087.
Read full abstract