Abstract

ABSTRACT A dynamic treatment regime is a sequence of decision rules, each of which recommends treatment based on features of patient medical history such as past treatments and outcomes. Existing methods for estimating optimal dynamic treatment regimes from data optimize the mean of a response variable. However, the mean may not always be the most appropriate summary of performance. We derive estimators of decision rules for optimizing probabilities and quantiles computed with respect to the response distribution for two-stage, binary treatment settings. This enables estimation of dynamic treatment regimes that optimize the cumulative distribution function of the response at a prespecified point or a prespecified quantile of the response distribution such as the median. The proposed methods perform favorably in simulation experiments. We illustrate our approach with data from a sequentially randomized trial where the primary outcome is remission of depression symptoms. Supplementary materials for this article are available online.

Highlights

  • A dynamic treatment regime operationalizes clinical decision making as a series of decision rules that dictate treatment over time

  • With adjustments to our method of maximizing probabilities, we derive optimal decision rules for maximizing quantiles of the response distribution. Both frameworks can be used to study the entire distribution of the outcome under an optimal dynamic treatment regime; investigators can examine how the optimal regime changes as the target probability or quantile is varied

  • We have proposed modeling frameworks for estimating optimal dynamic treatment regimes in settings where a non-mean distributional summary is the intended outcome to optimize

Read more

Summary

Introduction

A dynamic treatment regime operationalizes clinical decision making as a series of decision rules that dictate treatment over time. The Q-learning algorithm is an approximate dynamic programming procedure that requires modeling nonsmooth, nonmonotone transformations of data This leads to nonregular estimators for parameters that index the optimal regime and complicates the search for models that fit the data well since many standard regression modeling diagnostics are invalid (Robins, 2004; Chakraborty et al, 2010; Laber et al, 2014c; Song et al, 2015). With adjustments to our method of maximizing probabilities, we derive optimal decision rules for maximizing quantiles of the response distribution Both frameworks can be used to study the entire distribution of the outcome under an optimal dynamic treatment regime; investigators can examine how the optimal regime changes as the target probability or quantile is varied. The quantile framework provides an analog of quantile regression in the dynamic treatment regime setting for constructing robust estimators; for example, it enables optimization of the median response

Generalized Interactive Q-Learning
Threshold Interactive Q-learning
Quantile Interactive Q-Learning
Theoretical results
Simulation Experiments
QIQ-learning Simulations
Discussion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call