Abstract Within the realm of Discrete Event Systems (DES) theory, the problem of performance optimization for many applications can be modeled as an infinite-horizon, average-reward Markov Decision Process (MDP) with a finite state space. In principle, these MDPs can be solved by various well-developed methods like value iteration, policy iteration and linear programming. But in reality, the tractability of these methods in the context of the aforementioned applications is compromised by the explosive size of the underlying state spaces, a problem that is known as “the curse of dimensionality”. Hence, the corresponding performance optimization problems are frequently addressed by heuristic control policies. The considered work uses results from (i) the sensitivity analysis of Markov reward processes and (ii) the ranking & selection theory in statistics in order to develop a methodology for assessing the optimality of isolated decisions in the context of any well-defined heuristic control policy for the aforementioned MDPs. It also determines an improved decision when the current one is found to be suboptimal. Hence, when embedded in an iterative scheme, this methodology can support the incremental enhancement of the original heuristic policy in a way that controls, both, the computational and also the representational complexity of the new policy. Finally, an additional important feature of the presented methodology is that it can be executed either in an “off-line” mode, using a simulation of the dynamics of the underlying DES, or in an “on-line” mode, based on the sample path that is defined by the real-time dynamics of the controlled system.