Background. Targeted therapies, including BTK inhibitors like ibrutinib, have transformed the treatment of patients with chronic lymphocytic leukemia (CLL), improving overall survival and progression-free survival, in particular for patients with the most aggressive CLL disease. However, due to relapse after stopping treatment and the rarity of reaching undetectable minimum residual disease, treatment with ibrutinib requires long-term maintenance therapy. Thus, adverse events (AEs) may have a great impact on the patient's quality of life and overall survival. Several studies have reported higher risk of AEs including severe infections, bleeding, and cardiac events like atrial fibrillation (AF) and hypertension in CLL patients treated with ibrutinib. Identifying patients that might be at a higher risk of developing AEs at the time of treatment initiation could inform individualized treatment and may reduce morbidity and mortality. In this study, we used Machine learning (ML) to predict the risk of developing severe infection (≥ grade 3) after treatment with ibrutinib and/or other standard treatment regimens. Methods. The dataset consisted of 647 patients with CLL that were treated prior to December 2020. Using the Danish CLL registry, the Persimune data warehouse, and Electronic Health Record (EHR) data sources, we extracted baseline features such as age (at the diagnosis and treatment), sex, body mass index (BMI), smoking status, and a range of medical features (with cut-off date set as the date of treatment initiation) including routine laboratory tests, microbiology cultures, known CLL prognostics and historical data on previous infections and type of treatments. The treatment data and AEs were collected by manual review of EHRs as well as through direct data extraction. Features with more than 80% missing values were discarded. For continuous features, missing values were imputed by the mean value of each feature estimated from the training set. Overall, 167 features were used in the modeling. Since some patients received multiple lines of treatment, the dataset consisted of 1400 events of patients receiving treatment. Using an ML algorithm (XGBoost), Cox regression analyses were performed to predict severe infections (≥ Grade 3). Stratified group cross-validation was used to preserve the proportion of samples with severe infection across splits. The contribution of each feature across the study was measured using SHapley Additive exPlanations (SHAP). Results. We identified a high-risk and a low-risk group with 57% and 28% estimated risk of a severe infection within one year of treatment initiation, respectively (Figure 1). The median survival times for the high-risk and low-risk group were 7 (CI%95: 5.0-12.4) and 44 (CI%95: 39.0-56.2) months, respectively. The model could also discriminate the risk of infection within the group of ibrutinib treated and non-ibrutinib treated patients. A c-index of 0.687 (CI%95: 0.649-0.725) for all treatments, 0.672 (CI%95: 0.647-0.696) for ibrutinib, and 0.659 (CI%95: 0.597-0.721) for other treatments (including all treatments except ibrutinib) were achieved using 4-fold cross-validation. To assess impact of different features on the risk of severe treatment related infection, we performed SHAP analysis of feature contribution (Figure 2). The results showed that the targeted therapy (Targeted treatment) and ibrutinib treatment correlated with increased risk of infection. This may in part reflect that targeted therapy was primarily used for patients with aggressive CLL (TP53 aberration and/or relapsed/refractory CLL). In addition, the number of blood cultures drawn prior to treatment correlated with the risk of severe infection. Furthermore, several routine laboratory tests measuring immunoglobulin G, c-reactive protein, and high-density lipoprotein levels were predictive of severe infection. Conclusion. This study confirms the advantage of an ML based approach to stratify patients with CLL prior to initiation of treatment for risk of severe infections within and across treatment regimens. As infections are the main cause of death in CLL, this represents the first steps towards individualized therapy based on risk of AEs. Future work will focus on predicting cardiac events including AF. Validation of the findings in clinical trial populations and other real world data cohorts are awaited before clinical implementation. Figure 1View largeDownload PPTFigure 1View largeDownload PPT Close modal