Let A be a transition probability kernel on a finite state space Δ o = { 1 , … , d } such that A ( x , y ) > 0 for all x , y ∈ Δ o . Consider a reinforced chain given as a sequence { X n , n ∈ N 0 } of Δ o -valued random variables, defined recursively according to, L n = 1 n ∑ i = 0 n − 1 δ X i , P ( X n ∈ ⋅ ∣ X 0 , … , X n − 1 ) = L n A ( ⋅ ) . We establish a large deviation principle for { L n , n ∈ N } . The rate function takes a strikingly different form than the Donsker–Varadhan rate function associated with the empirical measure of the Markov chain with transition kernel A and is described in terms of a novel deterministic infinite horizon discounted cost control problem with an associated linear controlled dynamics and a nonlinear running cost involving the relative entropy function. Proofs are based on an analysis of time-reversal of controlled dynamics in representations for log-transforms of exponential moments, and on weak convergence methods.