Machine learning (ML) methods can identify complex patterns of treatment effect heterogeneity. However, before ML can help to personalize decision making, transparent approaches must be developed that draw on clinical judgment. We develop an approach that combines clinical judgment with ML to generate appropriate comparative effectiveness evidence for informing decision making. We motivate this approach in evaluating the effectiveness of nonemergency surgery (NES) strategies, such as antibiotic therapy, for people with acute appendicitis who have multiple long-term conditions (MLTCs) compared with emergency surgery (ES). Our 4-stage approach 1) draws on clinical judgment about which patient characteristics and morbidities modify the relative effectiveness of NES; 2) selects additional covariates from a high-dimensional covariate space (P > 500) by applying an ML approach, least absolute shrinkage and selection operator (LASSO), to large-scale administrative data (N = 24,312); 3) generates estimates of comparative effectiveness for relevant subgroups; and 4) presents evidence in a suitable form for decision making. This approach provides useful evidence for clinically relevant subgroups. We found that overall NES strategies led to increases in the mean number of days alive and out-of-hospital compared with ES, but estimates differed across subgroups, ranging from 21.2 (95% confidence interval: 1.8 to 40.5) for patients with chronic heart failure and chronic kidney disease to -10.4 (-29.8 to 9.1) for patients with cancer and hypertension. Our interactive tool for visualizing ML output allows for findings to be customized according to the specific needs of the clinical decision maker. This principled approach of combining clinical judgment with an ML approach can improve trust, relevance, and usefulness of the evidence generated for clinical decision making. Machine learning (ML) methods have many potential applications in medical decision making, but the lack of model interpretability and usability constitutes an important barrier for the wider adoption of ML evidence in practice.We develop a 4-stage approach for integrating clinical judgment into the way an ML approach is used to estimate and report comparative effectiveness.We illustrate the approach in undertaking an evaluation of nonemergency surgery (NES) strategies for acute appendicitis in patients with multiple long-term conditions and find that NES strategies lead to better outcomes compared with emergency surgery and that the effects differ across subgroups.We develop an interactive tool for visualizing the results of this study that allows findings to be customized according to the user's preferences.