Multiple myeloma (MM) patients have highly variable overallsurvival (OS) ranging from few weeks to more than ten years. Discovering an early biosignature to stratify short-term from long-term survivors offers the prospect of treating at-risk patients. Machine learning (ML) algorithms are currently being tested to discover biosignatures, but they consist of several features that make their implementation in healthcare an arduous task. Here, we have developed an algorithm called AlgoOS to stratify newly diagnosed MM (NDMM) patients by integrating a NetRank algorithm, a variation of the Google PageRank algorithm, and ML algorithms-the first of its kind in MM. Also, a dataset of NDMM patients ( n=31) was built consisting of transcriptomic (features=28256), clinical (features=13), biochemical (features=12), and fluorescent in situ hybridization (FISH) (features=3) data. A cut-off OS of 46 months was used to group short and long-term survivors based on domain knowledge. Finally, AlgoOS was implemented on this dataset, and a biosignature predictive of NDMM patient stratification was extracted. The prediction model's performance was evaluated by accuracy, precision, and F1-score, and 5-fold cross-validations were performed. R was used to build a transcription-factor-gene-regulatory network, while all other analyses, including ML, were performed using Python. During 1 st step of AlgoOS, all transcriptomic features were ranked by NetRank score, and the top 20 were selected for further processing. This ranking was similar to web page ranking done by the Google PageRank algorithm, except that the NetRank algorithm also takes into account the correlation of features to OS. In detail, we calculated NetRank scores of transcriptomic features by building a transcription-factor-gene-regulatory network using the JASPAR-v2022 database. Each transcription factor motif in JASPAR was matched to the putative promoter region (upstream 1000 base pairs) of genes in the hg38 human reference genome. Furthermore, correlations of transcriptomic features to OS were computed, and the NetRank score was calculated by iteratively optimizing a damping factor parameter d, see Figure 1. We trained the support vector machine (SVM) on top NetRanked 10-20000 transcriptomic features, trained it separately on the same number of randomly selected features and calculated their performance scores. The models were chosen by the criteria 1) precision and accuracy ≥ 80%, 2) kernel = non-linear, and 3) C < 1, where C is a regularization parameter. It was found that the top 20 NetRanked features were the best predictive of patient stratification. The results were better than random feature selection (accuracy= 90 vs. 62, precision= 100 vs. 44, F1-score= 91 vs. 51). The results were also validated by another classifier randomforest (RF) (accuracy= 70 vs. 44, precision= 71 vs. 32, F1-score= 77 vs. 44). During the 2 nd AlgoOS step, the importance of the 20 features in the RF model was calculated to find that leaving out 2 of them did not affect the performance. Therefore, 18 transcriptomic features were integrated with clinical, biochemical, and FISH data, and a final RF was trained. It was found that data integration improved the performance of the final RF compared to the previous RF model that was trained only on transcriptomic data (accuracy= 90 vs. 70, precision= 100 vs. 71, F1-score= 91 vs. 77). Finally, we found a novel biosignature to stratify short-term from long-term survivors. The biosignature consisted of twenty nine features only, including eighteen transcriptomic ( TMEM62, BUB1, CXorf21, PLEKHM1, PRPF18, ULK2, VMP1, LONRF1, USP15, TBC1D5, NUP93, CASP4, NDFIP1, TPM1, NRXN2, ALG9, C1QC and SOX13), seven biochemical (LDH, albumin, creatinine, calcium, M-proteins, Hb light chains, Hb heavy chains), three clinical (age, gender, ISS stage) and one FISH (17P13) features. Moreover, LDH, TMEM62 and BUB1 were the three most important features, see Table1. Bioinformatics analysis of the 18 transcriptomic features in the biosignature showed an enrichment to autophagy (autophagic cell death, p-value=9.8×10 -3; autophagosome, p-value= 1.5×10 -2) that is a known mechanism of resistant to drugs and cell death in MM. Our findings indicate that creating a biosignature using transcriptomic features in addition to previously used prognostic factors may improve prediction of outcome in MM. The model has to be tested in larger clinical material.
Read full abstract