Development of a Machine Learning Algorithm for Rapid, Point-of-Care Prediction of Serum Monoclonal Proteins in Multiple Myeloma

Ehsan Malek,Jeries Kort,Gi-Ming Wang,Paolo F Caimi,Kirsten M Boughan,Stanton L Gerson,Brenda W Cooper,Molly M Gallogly,Benjamin K Tomlinson,Marcos De Lima,Curtis Tatsuoka,James Driscoll

doi:10.1182/blood-2020-139733

Abstract

Multiple Myeloma (MM) is a cancer of terminally-differentiated plasma cells residing in the bone marrow. Myeloma cells frequently secrete monoclonal proteins that can be used to assess tumor volume and patient response to therapy. Monoclonal proteins are measured by gel electrophoresis and subsequent immunofixation of the observed M-spike for protein typing. However, this a time-consuming process that may take up to 3-5 days that delays physician-patient decision-making, determining response to treatment and can be a significant psychological stressor for patients. Hence, there is an unmet need to develop a more rapid, point-of-care method to determine M-spike levels. Gamma gap is the difference between total serum protein and albumin and includes a variety metabolic proteins, i.e., transferrin, as well as immunologic proteins, e.g., non-involved immunoglobulins, in addition to the M-spike. Since estimation of the non-M-spike portion of the gamma gap cannot be achieved on routine patient care, the gamma gap cannot serve as an accurate surrogate for M-spike protein levels. Here, we hypothesized that an artificial intelligence (AI) algorithm utilizing readily available clinical and laboratory data along with previous and same-day lab variables can accurately predict M-spike levels without the need for serum electrophoresis. Methods: A total of 171 MM patients with 1,472 observations were included in the study, where the upper limit of the observed M-spike was 3.5 gr/dL. Correlation of the observed M-spike with gamma gap was assessed by two correlation methods using the Pearson and Spearman tests. Forty three clinical and lab variables (including total serum protein and albumin) as predictors of M-spike were fed into the machine learning model. Two lagged variables as the last two preceding M-spike values by the same subject were included. When needed, imputation for missing values was applied through interpolation from subject-level linear trend analysis. The random forest model was used, where regression forests are an ensemble of different regression trees and are used for nonlinear multiple regression. The default number of trees was set to be n = 500, and the number of variables considered at each split after random selection was 13. The goal of using a large number of trees was to train enough that each feature had a chance to appear in several models. The data was randomly split into a training set (80%) and a test set (20%), and a regression tree was built with the training set and then validated using the test set. Bootstrapping was used to generate a collection of data sets (n=500), leading to a random forest of regression trees. Results and estimates were combined across trees. Importance was measured by leaving a covariate out of models, and comparing performance with its inclusion. All analyses were performed using R v3.6.2 and its libraries. Results: Median age of the study cohort was 73 years old, range: 42-96), and 44% were male. The median M-spike value was (0.7 gr/dL, range: 0.1-3.5). Fig. 1 shows the number of observations and magnitude distribution for M-spike levels among the patients included in our study. The correlation of the calculated gamma gap and observed M-spike levels was assessed by two methods (Fig.2). The Pearson coefficient was 0.43 for M-spike levels &lt;1 and 0.72 for M-spike levels &gt;1 gr/dL, respectively (Fig.2a). The Spearman coefficient was 0.41 for M-spike levels &lt;1 and 0.74 for M-spike levels &gt;1 suggesting a low overall correlation overall, especially for M-spike levels &lt;1 gr/dL (Fig .2b). In contrast, as shown in Fig. 3, M-spike levels predicted by the AI algorithm (i.e., fitted M-spike in the test set) correlated highly with the observed M-spike levels in the test set (R-square: 94% and RMSE of 0.21). The Pearson and Spearman coefficients were 0.97 and 0.95, respectively. Fig. 3b. Indicates the residual distribution for the RF model with most of values are close to and on both side zero value. Conclusion: Here, we showed that the difference between total protein and albumin (i.e., gamma gap) is a rough estimate of M-spike, especially with lower values. AI algorithm trained by 43 readily available clinical and laboratory variables could predict the observed M-spike very robustly. Taken together, our results indicate that the AI-based method developed here can be further advanced for rapid, accurate, point-of-care measurement of M-spike protein levels in MM patients. Figure 1 Disclosures Malek: Cumberland: Research Funding; Sanofi: Other: Advisory board; Clegene: Other: Advisory board , Speakers Bureau; Takeda: Other: Advisory board , Speakers Bureau; Janssen: Other: Advisory board, Speakers Bureau; Bluespark: Research Funding; Amgen: Honoraria; Medpacto: Research Funding. Caimi:Amgen: Other: Advisory Board; Verastem: Other: Advisory Board; Celgene: Speakers Bureau; Bayer: Other: Advisory Board; ADC Therapeutics: Other: Advisory Board, Research Funding; Kite Pharma: Other: Advisory Board. de Lima:Celgene: Research Funding; Pfizer: Other: Personal fees, advisory board, Research Funding; Kadmon: Other: Personal Fees, Advisory board; Incyte: Other: Personal Fees, advisory board; BMS: Other: Personal Fees, advisory board.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Development of a Machine Learning Algorithm for Rapid, Point-of-Care Prediction of Serum Monoclonal Proteins in Multiple Myeloma

Abstract

Talk to us

Similar Papers

More From: Blood

Lead the way for us

Similar Papers

Whole-Exome Sequencing Identifies a Somatic Cell Mutation Signature That Predicts Relapse Risk and Survival Probability in Multiple Myeloma
Ehsan Malek ... James J Driscoll
Blood | VOL. 136
Ehsan Malek, et. al.Ehsan Malek ... James J Driscoll
05 Nov 2020
Blood | VOL. 136

Impact of Daratumumab on Stem Cell Collection, Graft Composition and Engraftment Among Multiple Myeloma Patients Undergoing Autologous Stem Cell Transplant
Shivaprasad Manjappa ... Ehsan Malek
Blood | VOL. 136
Shivaprasad Manjappa, et. al.Shivaprasad Manjappa ... Ehsan Malek
05 Nov 2020
Blood | VOL. 136

Analyzing Risk of Infection with Anti-CD38 Monoclonal Antibody Therapy for Patients with Multiple Myeloma
Augustine Hong ... Ehsan Malek
Blood | VOL. 136
Augustine Hong, et. al.Augustine Hong ... Ehsan Malek
05 Nov 2020
Blood | VOL. 136

Health Care Burden of Monogammopathy of Renal Significance
Jeries Kort ... Ehsan Malek
Blood | VOL. 136
Jeries Kort, et. al.Jeries Kort ... Ehsan Malek
05 Nov 2020
Blood | VOL. 136

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Development of a Machine Learning Algorithm for Rapid, Point-of-Care Prediction of Serum Monoclonal Proteins in Multiple Myeloma

Abstract

Talk to us

Similar Papers

More From: Blood