Abstract

Objective:To apply the latest guidance for estimating and evaluating heterogeneous treatment effects (HTEs) in an end-to-end case study of the Long-term Anticoagulation Therapy (RE-LY) trial, and summarize the main takeaways from applying state-of-the-art metalearners and novel evaluation metrics in-depth to inform their applications to personalized care in biomedical research. Methods:Based on the characteristics of the RE-LY data, we selected four metalearners (S-learner with Lasso, X-learner with Lasso, R-learner with random survival forest and Lasso, and causal survival forest) to estimate the HTEs of dabigatran. For the outcomes of (1) stroke or systemic embolism and (2) major bleeding, we compared dabigatran 150 mg, dabigatran 110 mg, and warfarin. We assessed the overestimation of treatment heterogeneity by the metalearners via a global null analysis and their discrimination and calibration ability using two novel metrics: rank-weighted average treatment effects (RATE) and estimated calibration error for treatment heterogeneity. Finally, we visualized the relationships between estimated treatment effects and baseline covariates using partial dependence plots. Results:The RATE metric suggested that either the applied metalearners had poor performance of estimating HTEs or there was no treatment heterogeneity for either the stroke/SE or major bleeding outcome of any treatment comparison. Partial dependence plots revealed that several covariates had consistent relationships with the treatment effects estimated by multiple metalearners. The applied metalearners showed differential performance across outcomes and treatment comparisons, and the X- and R-learners yielded smaller calibration errors than the others. Conclusions:HTE estimation is difficult, and a principled estimation and evaluation process is necessary to provide reliable evidence and prevent false discoveries. We have demonstrated how to choose appropriate metalearners based on specific data properties, applied them using the off-the-shelf implementation tool survlearners, and evaluated their performance using recently defined formal metrics. We suggest that clinical implications should be drawn based on the common trends across the applied metalearners.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call