Introduction: There is substantial variation in the prognosis and response to therapy among multiple myeloma (MM) patients. This diversity is influenced by both intrinsic tumor features and the tumor microenvironment. We hypothesize that the clinical outcomes post IMiD and/or PI are linked to molecular subtypes dictated by tumor sample transcriptomics. We previously showed that molecular subtyping based on the CoMMpass bulk RNAseq data has an association with time to second-line therapy (log rank p = 0.0018; Hamidi et al., Blood 140: 7208-7210). Here, we applied the same molecular subtyping schema to single cell RNAseq isolated plasma cell transcriptomic profiles. Methods: Bone marrow biopsies from newly diagnosed MM patients (N = 200) in the MMRF CoMMpass study (NCT01454297) were collected, enriched by CD138 negative bead selection, and scRNAseq was performed by MMRF Immune Profiling Consortium, a collaboration between MMRF and laboratories at Beth Israel Deaconess Medical Center, Icahn School of Medicine at Mt. Sinai, Mayo Clinic, Washington University in St. Louis, and Emory University. After quality control and unsupervised clustering, plasma cells were subset out and pseudo-bulk expression profiles were generated for baseline samples with more than 100 plasma cells (N = 95). Unsupervised machine learning approach based on consensus non-negative matrix factorization (cNMF) was applied to identify the transcriptomics-based molecular subtypes. A random forest model, built from scRNAseq-based molecular subtypes (scNMF), was applied to bulk RNAseq data with a larger sample size (N = 754) to train a predictive molecular subtyping (pNMF). Potential cell-cell communication events based on scRNAseq were also predicted by LIgand-receptor ANalysis frAmework (LIANA). Results: 705,959 single cells passed the quality control and were broadly annotated as T/NK cells, B cells, plasma cells, myeloid cells, erythroid, progenitor, and stromal cells. Consistent with previous reports, CD138 negative bead selection did not completely remove plasma cells, resulting in an average of 18.5% plasma cells (n = 130,213). Subsequent unsupervised clustering of plasma cells exhibited a patient specific clustering pattern, suggestive of the heterogeneity of these tumor-derived cells. Unsupervised NMF on the pseudo-bulk expressions of baseline plasma cells identified five patient subsets. In the same cohort of patients (N = 95), subtypes assigned by the single cell had a better association with time to second-line therapy ( p-value = 0.0027 in scNMF as compared to p = 0.13 in bulk NMF). The median time to 2nd line of 2,443 days in scNMF5.1 versus 585 days in scNMF5.4; and the median time to 2nd line of 1,597 days in NMF5.2 versus 850 days in NMF5.3. A predicted subtyping based on scRNAseq data (pNMF) had a significantly better association with time to second-line therapy (Figure 1A, log rank p = 0.00088), compared to the published molecular subtyping based on bulk RNAseq (log rank p = 0.0018). The median time to 2nd line of 1,342 days in pNMF5.1 versus 597 days in pNMF5.5; compared to the median time to 2nd line of 1,347 days in NMF5.2 versus 850 days in NMF5.3. The best clinical outcome subset pNMF5.1 was characterized with higher levels of IFN pathways (Figure 1B, Hallmark interferon gamma and alpha pathways Wilcoxon p = 0.003 and p = 0.010, respectively) and higher levels of tumor-promoting inflammatory microenvironment (Hallmarks inflammatory response and TNFα signaling via NF-κB p = 0.003 and p = 0.010, respectively). The worst outcome subset pNMF5.5, on the other hand, displayed higher levels of glycolysis, a promoting factor of tumor development ( p < 0.001). Moreover, the total number of cell-cell interactions between the major cell types differed based on scRNAseq subtypes, informing the underlying biology driving the molecular heterogeneity. We identified a positive feedback loop of CXCL12/CXCR4 signaling between stromal and plasma cells, enhancing tumor cell proliferation. Conclusions: The identification of molecular subtypes based on single-cell data through an innovative unsupervised machine learning approach can have significant implications for determining which NDMM patients are likely to respond to established therapies; and the cellular level resolution allows deconvolution of cell-cell signaling, enhancing our understanding of the underlying molecular biology and mechanisms of resistance.
Read full abstract