Myelofibrosis (MF) is a myeloproliferative neoplasm (MPN) with survival outcomes ranging from months to years and variable risk of transformation to acute myeloid leukemia. Allogeneic bone marrow transplantation (BMT) can be curative but is associated with high treatment-related morbidity and mortality, therefore accurate risk stratification is important to guide clinical decision making in MF. Current risk prediction models use clinical and/or genomic features but do not consider the properties of the disease-driving stem cell population. Here, we use transcriptomic variation corresponding to both intra- and inter-patient heterogeneity among MF stem cells to generate novel gene expression-based scores predictive of survival and leukemic transformation in MF. To train and validate novel prognostic scores in MF, we identified 358 patients from an MPN registry at the Princess Margaret Cancer Centre (ClinicalTrials.gov Identifier: NCT02760238) from whom peripheral blood (PB) cells were collected near the date of MF diagnosis. All patients were diagnosed with either primary, post-PV, post-ET or pre-fibrotic MF with clinical follow-up of up to 12.2 years. RNA was extracted from unsorted PB mononuclear cells and RNA sequencing (RNAseq) was performed at an average depth of 50 million reads per sample. We randomly split our MF cohort into training (70%; n=250) and test sets (30%; n=108) and utilized a repeated nested cross validation approach together with statistical regression, to generate and assess the performance of models to predict survival within the training set. We tested 36,000 models derived from 36 initial MF-related genesets, ranging from stem-cell specific genesets to the whole transcriptome. The most accurate models by cross validation (median multivariable p-value = 6e-5) were produced from our retrospective identification of highly variable genes in single-cell RNAseq data derived from 82,255 Lin-CD34+ MF stem and progenitor cells across 15 patients (Psaila et al., 2020). Thus, features of intra- and inter-patient heterogeneity among MF stem and progenitor cells proved to be the most relevant for predicting survival. From these features, we derived our final model calculated as the weighted sum of gene expression across 24 genes (termed MPN24). We categorized patients with MPN24 scores above or below the training cohort median as MPN24 high or low, respectively. This model was validated in the test set, with high and low score patients experiencing 5-year survival rates of 71% [95% CI 57-88%] and 21% [95% CI 9%-52%], respectively, when censored at time of BMT (HR = 5.3 [95% CI 2.6-10.5]; p=2.1e-6) (Figure 1A). MPN24 retained independent prognostic value in multivariable analysis incorporating age, sex, DIPSS category, ECOG status, fibrosis grade, constitutional symptoms, and PB blast percentage (adjusted HR = 5.7 [95% CI 2.2-14]; p = 3e-4). Importantly, DIPSS classification remained a significant covariate, indicating that MPN24 and DIPSS capture distinct features of disease. We therefore developed a new three-tier classification scheme integrating both DIPSS and MPN24 scores (Figure 1B). Patients classified as low-, intermediate- or high-risk in this new classification scheme experienced 5-year survival rates of 88.2% [95% CI 77.9% - 99.9%], 39.3% [95% CI 19.9% - 77.7%] and 10.8% [95% CI 2.1% - 55.8%], respectively (likelihood ratio test p = 1e-8). Finally, from the MPN24 genes we derived a 13 gene subscore (MPN13) predictive of leukemic transformation in the training set. Patients in the test set scoring above the 80th percentile from the training set were classified as high risk and the remaining as low risk. Although total leukemic samples were limiting, MPN13 was significantly associated with risk of transformation (p=4.7e-03) with low and high-risk patients experiencing 3-year cumulative incidences of transformation of 5.2% [95% CI 0.2%-10.2%] and 28.6% [95% CI 3.1%-54.0%] respectively, after adjusting for death as a competing risk. Future studies will transfer our scores to the Nanostring platform for testing in independent cohorts. In summary, we used transcriptional variation among MF stem and progenitor cells to derive novel gene expression scores predictive of survival and leukemic transformation and developed a new integrated 3-tier model for predicting risk in MF patients. Figure 1View largeDownload PPTFigure 1View largeDownload PPT Close modal