Multiple Myeloma (MM) presents as a multi-system disease with significant derangements in the immune system, hematopoiesis, renal function, bone metabolism and electrolyte homeostasis. End-organ damage is the key factor that differentiates plasma cell (PC) dyscrasia requiring therapy, i.e., active MM, from disease precursor stages (Monoclonal Gammopathy of Undetermined Significance, MGUS and Smoldering MM, SMM) that do not require therapy. The prevailing assumption is that MGUS and SMM patients do not harbor significant or measureable systemic impact of clonal PCs. Therefore, a watch-and-wait approach is employed in MGUS and SMM patients until end-organ damage is evident. Here, we hypothesized that the systemic impact of clonal PCs is a continuum throughout disease progression, rather than a binary process, and that the impact on organ systems correlates directly with the volume of the malignant clone as measured by monoclonal protein levels. Therefore, to capture the systemic impact of PC clones, we utilized a Machine Learning (ML) approach that incorporates all available clinical and lab variables. Methods: A total of 171 patients with PC dyscrasias including 1,472 observations were analyzed. Forty-three clinical and lab variables were identified as predictors of M-spike protein levels and then fed into the ML model. Two lagged variables as the last two preceding M-spike values by the same subject were included. When needed, imputation for missing values was applied through interpolation from subject-level linear trend analysis. The random forest model was used, where regression forests are an ensemble of different regression trees and are used for nonlinear multiple regression. The default number of trees was set at n = 500, and the number of variables considered at each split after random selection was 13. The goal of using a large number of trees was to train enough that each feature had a chance to appear in several models. The data was randomly split into a training set, N=136 (80%) and a test set, N=35 (20%), and a regression tree was built with the training set and then validated using the test set. Bootstrapping was used to generate a collection of data sets (n=500), leading to a random forest of regression trees. Results and estimates were combined across trees. Importance was measured by leaving a covariate out of models, and comparing performance with its inclusion. All analyses were performed using R v3.6.2 and its libraries. Results: Median age of patients included in the study cohort was 73 years (range: 42-96), and 44% were male. Median M-spike value was (0.7 gr/dL, range: 0.1-3.5). Fig. 1 shows the number of observations and magnitude distribution for M-spike levels among the patients included in our study. As shown in Fig. 2, M-spike levels predicted by the ML algorithm, calculated M-spike in the test set, correlated highly with the observed M-spike levels in the test set (R-square: 94% and RMSE of 0.21). The Pearson and Spearman coefficients were 0.97 and 0.95, respectively. Fig. 3b. Indicates the residual distribution for the RF model with most of values are close to and on both side zero value. Conclusions: Here, we assessed the multi-system impact of plasma cell clone using a ML algorithm. Our results show that the degree of systemic involvement induced by the clone correlates directly with magnitude of the ML-calculated M-spike value. Applying this algorithm in longitudinal studies of patients with MGUS and SMM is warranted to detect the optimal point of multi-systemic impact of clonal PCs to start therapy. Figure 1View largeDownload PPTFigure 1View largeDownload PPT Close modal