Consider the following “inverse stochastic control” problem. A statistician observes a realization of a controlled stochastic process $\{ d_t ,x_t \} $ consisting of the sequence of states $x_t$, and decisions $d_t$ of an agent at times $t = 1, \cdots ,T$. The null hypothesis is that the agent’s behavior is generated from the solution to a Markovian decision problem. The inverse problem is to use the data $\{ d_t ,x_t \} $ to go backward and “uncover” the agent's objective function U, and his beliefs about the law of motion of the state variables p. The problem is complicated by the fact that the statistician generally only observes a subset $x_t$ of the state variables $(x_t ,\eta _t )$ observed by the agent. This paper formulates the inverse problem as a problem of statistical inference, explicitly accounting for unobserved state variables$\eta _t $, in order to produce a nondegenerate and internally consistent statistical model. Specifically, the functions U and p are assumed to depend on a vector of unknown parameters $\theta $ known by the agent but not by the statistician. The agent’s preferences and expectations are uncovered by finding a parameter vector $\hat \theta $ that maximizes the likelihood function for the observed sample of data. The difficulty is that neither the dynamic programming problem nor the associated likelihood function has an a priori known functional form. In general the solution is only described recursively via Bellman’s “principle of optimality.” This paper derives a nested fixed-point maximum likelihood algorithm that computes e and the associated value function $V_{\hat \theta } $ for a class of discrete control processes$(d_t ,x_t )$, where the control variable $d_t$ is restricted to a finite set of alternatives. Given M independent realizations of $(d_t ,x_t )$ for T time periods, it is shown that 8 converges to the true parameter $\theta ^ * $ with probability 1 and has an asymptotic Gaussian distribution as M (or the number of periods T) approaches infinity. Uniform convergence of the algorithm is established by showing that the estimated value function $V_{\hat \theta } $ (a random element in a Banach space B) converges with probability 1 to the true value function and has an asymptotic Gaussian distribution in B.