New Statistical Learning Methods for Estimating Optimal Dynamic Treatment Regimes

Ying-Qi Zhao,Donglin Zeng,Eric B Laber,Michael R Kosorok

doi:10.1080/01621459.2014.937488

Ying-Qi Zhao, Donglin Zeng + Show 2 more

Open Access

https://doi.org/10.1080/01621459.2014.937488

Copy DOI

Abstract

Dynamic treatment regimes (DTRs) are sequential decision rules for individual patients that can adapt over time to an evolving illness. The goal is to accommodate heterogeneity among patients and find the DTR which will produce the best long-term outcome if implemented. We introduce two new statistical learning methods for estimating the optimal DTR, termed backward outcome weighted learning (BOWL), and simultaneous outcome weighted learning (SOWL). These approaches convert individualized treatment selection into an either sequential or simultaneous classification problem, and can thus be applied by modifying existing machine learning techniques. The proposed methods are based on directly maximizing over all DTRs a nonparametric estimator of the expected long-term outcome; this is fundamentally different than regression-based methods, for example, Q-learning, which indirectly attempt such maximization and rely heavily on the correctness of postulated regression models. We prove that the resulting rules are consistent, and provide finite sample bounds for the errors using the estimated rules. Simulation results suggest the proposed methods produce superior DTRs compared with Q-learning especially in small samples. We illustrate the methods using data from a clinical trial for smoking cessation. Supplementary materials for this article are available online.

Highlights

It is widely-recognized that the best clinical regimes are adaptive to patients over time, observing that there exists significant heterogeneity among patients, and that often the disease is evolving and as diversified as the patients themselves (Wagner et al, 2001)
Current algorithms from SVMs are adjusted and further developed for simultaneous outcome weighted learning (SOWL). We demonstrate that both backward outcome weighted learning (BOWL) and SOWL consistently estimate the optimal decision rule and they provide better Dynamic treatment regimes (DTRs) than Q- and Alearning in simulated experiments
We formalize the problem of estimating an optimal DTR in a mathematical framework. We reformulate this problem as a weighted classification problem and based on this reformulation we propose two new learning methods for estimating the optimal DTR

Summary

Introduction

It is widely-recognized that the best clinical regimes are adaptive to patients over time, observing that there exists significant heterogeneity among patients, and that often the disease is evolving and as diversified as the patients themselves (Wagner et al, 2001). Treatment for major depressive disorder is usually driven by factors emerging over time, such as side-effect severity, treatment adherence and so on (Murphy et al, 2007); the treatment regimen for non-small cell lung cancer involves multiple lines of treatment (Socinski and Stinchcombe, 2007); and clinicians routinely update therapy according to the risk of toxicity and antibiotics resistance in treating cystic fibrosis (Flume et al, 2007) As these examples make clear, in many cases a “once and for all” treatment strategy is suboptimal due to its inflexibility but unrealistic (or even unethical). Instead of focusing on short-term benefit of a treatment, an effective treatment strategy should aim for long-term benefits by accounting for delayed treatment effects

Methods

Results

Conclusion