A Tauberian Theorem and Uniform ∈-Optimality in Hidden Markov Decision Problems
In this chapter, we prove a Tauberian Theorem regarding the relation between the Abel limit and the Ces`aro limit of a sequence of real numbers, and apply it to prove that a uniformly $\ep$-optimal strategy exists in Hidden Markov decision problems.
- Research Article
482
- 10.1137/1009030
- Apr 1, 1967
- SIAM Review
Next article Contraction Mappings in the Theory Underlying Dynamic ProgrammingEric V. DenardoEric V. Denardohttps://doi.org/10.1137/1009030PDFBibTexSections ToolsAdd to favoritesExport CitationTrack CitationsEmail SectionsAbout[1] Richard Bellman, Dynamic programming, Princeton Univeristy Press, Princeton, N. J., 1957xxv+342 MR0090477 Google Scholar[2] David Blackwell, Discrete dynamic programming, Ann. Math. Statist., 33 (1962), 719–726 MR0149965 0133.12906 CrossrefISIGoogle Scholar[3] David Blackwell, Discounted dynamic programming, Ann. Math. Statist., 36 (1965), 226–235 MR0173536 0133.42805 CrossrefGoogle Scholar[4] A. Charnes and , R. G. Schroeder, On some tactical antisubmarine games, Systems Research Memorandum No. 131, The Technological Institute, Northwestern University, Evanston, Illinois, 1965 Google Scholar[5] E. V. Denardo, Masters Thesis, Sequential decision processes, Doctoral thesis, Northwestern University, Evanston, Illinois, 1965 Google Scholar[6] F. D'Epenoux, Sur un problème de production de stockage dans l'aleatoire, Rev. Française Recherche Operationelle, 14 (1960), 3–16 Google Scholar[7] Cyrus Derman, On sequential decisions and Markov chains, Management Sci., 9 (1962/1963), 16–24 MR0169685 0995.90621 CrossrefISIGoogle Scholar[8] Cyrus Derman and , Morton Klein, Some remarks on finite horizon Markovian decision models, Operations Res., 13 (1965), 272–278 MR0175636 0137.13901 CrossrefISIGoogle Scholar[9] J. H. Eaton and , L. A. Zadeh, Optimal pursuit strategies in discrete-state probabilistic systems, Trans. ASME Ser. D. J. Basic Engrg., 84 (1962), 23–29 MR0153510 CrossrefGoogle Scholar[10] L. È. Èlsgol'c, Qualitative methods in mathematical analysis, Translations of Mathematical Monographs, Vol. 12, American Mathematical Society, Providence, R.I., 1964vii+250, Trans. by A. A. Brown and J. M. Danskin MR0170048 0133.37102 CrossrefGoogle Scholar[11] B. Fox, Age replacement with discounting, Operations Res., to appear Google Scholar[12] Ronald A. Howard, Dynamic programming and Markov processes, The Technology Press of M.I.T., Cambridge, Mass., 1960viii+136 MR0118514 0091.16001 Google Scholar[13A] William S. Jewell, Markov-renewal programming. I. Formulation, finite return models, Operations Res., 11 (1963), 938–948 MR0163374 0126.15905 CrossrefISIGoogle Scholar[13B] William S. Jewell, Markov-renewal programming. II. Infinite return models, example, Operations Res., 11 (1963), 949–971 MR0163375 0126.15905 CrossrefISIGoogle Scholar[14] Samuel Karlin, The structure of dynamic programming models, Naval Res. Logist. Quart., 2 (1955), 285–294 (1956) MR0077850 CrossrefGoogle Scholar[15] L. G. Mitten, Composition principles for synthesis of optimal multistage processes, Operations Res., 12 (1964), 610–619 MR0180374 0127.36502 CrossrefISIGoogle Scholar[16] L. S. Shapley, Stochastic games, Proc. Nat. Acad. Sci. U. S. A., 39 (1953), 1095–1100 MR0061807 0051.35805 CrossrefISIGoogle Scholar[17] Lars Erik Zachrisson, M. Dresher, , L. S. Shapley and , A. W. Tucker, Markov gamesAdvances in game theory, Princeton Univ. Press, Princeton, N.J., 1964, 211–253 MR0170729 Google Scholar Next article FiguresRelatedReferencesCited byDetails Qauxi: Cooperative multi-agent reinforcement learning with knowledge transferred from auxiliary taskNeurocomputing, Vol. 504 Cross Ref Data-driven optimal control with a relaxed linear programAutomatica, Vol. 136 Cross Ref Markov Decision Processes with Discounted Costs: Improved Successive Over-Relaxation Method24 March 2022 Cross Ref Markov Decision Processes with Discounted Rewards: Improved Successive Over-Relaxation Method12 January 2022 Cross Ref Robust Speed Control of Ultrasonic Motors Based on Deep Reinforcement Learning of a Lyapunov FunctionIEEE Access, Vol. 10 Cross Ref Data-Driven Optimal Control of Affine Systems: A Linear Programming PerspectiveIEEE Control Systems Letters, Vol. 6 Cross Ref Stochastic Dynamic Programming with Non-linear Discounting23 December 2020 | Applied Mathematics & Optimization, Vol. 84, No. 3 Cross Ref On Constructive Extractability of Measurable Selectors of Set-Valued MapsIEEE Transactions on Automatic Control, Vol. 66, No. 8 Cross Ref On the convergence of reinforcement learning with Monte Carlo Exploring StartsAutomatica, Vol. 129 Cross Ref Successive Over-Relaxation ${Q}$ -LearningIEEE Control Systems Letters, Vol. 4, No. 1 Cross Ref Affine Monotonic and Risk-Sensitive Models in Dynamic ProgrammingIEEE Transactions on Automatic Control, Vol. 64, No. 8 Cross Ref Optimal forest management under financial risk aversion with discounted Markov decision process modelsCanadian Journal of Forest Research, Vol. 49, No. 7 Cross Ref Optimizing over pure stationary equilibria in consensus stopping games2 November 2018 | Mathematical Programming Computation, Vol. 11, No. 2 Cross Ref Robust shortest path planning and semicontractive dynamic programming8 August 2016 | Naval Research Logistics (NRL), Vol. 66, No. 1 Cross Ref On the reduction of total‐cost and average‐cost MDPs to discounted MDPs25 May 2017 | Naval Research Logistics (NRL), Vol. 66, No. 1 Cross Ref Optimal Liquidation in a Level-I Limit Order Book for Large-Tick StocksAntoine Jacquier and Hao Liu5 July 2018 | SIAM Journal on Financial Mathematics, Vol. 9, No. 3AbstractPDF (845 KB)An Average Polynomial Algorithm for Solving Antagonistic Games on Graphs2 March 2018 | Journal of Computer and Systems Sciences International, Vol. 57, No. 1 Cross Ref Dynamic Programming15 February 2018 Cross Ref Dynamic Programming and Markov Decision Processes15 February 2018 Cross Ref Long-Term Values in Markov Decision Processes, (Co)Algebraically20 September 2018 Cross Ref IDENTIFICATION OF DISCRETE CHOICE DYNAMIC PROGRAMMING MODELS WITH NONPARAMETRIC DISTRIBUTION OF UNOBSERVABLES21 March 2016 | Econometric Theory, Vol. 33, No. 3 Cross Ref Dynamic Programming, Numerical15 February 2017 Cross Ref Regular Policies in Abstract Dynamic ProgrammingDimitri P. Bertsekas17 August 2017 | SIAM Journal on Optimization, Vol. 27, No. 3AbstractPDF (510 KB)Optimal Liquidation in a Level-I Limit Order Book for Large Tick StocksSSRN Electronic Journal Cross Ref Easy Affine Markov Decision Processes: TheorySSRN Electronic Journal Cross Ref Optimality of the fastest available server policy1 October 2016 | Queueing Systems, Vol. 84, No. 3-4 Cross Ref A global shooting algorithm for the facility location and capacity acquisition problem on a line with dense demandComputers & Operations Research, Vol. 71 Cross Ref Optimality of the Fastest Available Server PolicySSRN Electronic Journal Cross Ref Approximation of two-person zero-sum continuous-time Markov games with average payoff criterionOperations Research Letters, Vol. 43, No. 1 Cross Ref On variable discounting in dynamic programming: applications to resource extraction and other economic models9 August 2011 | Annals of Operations Research, Vol. 220, No. 1 Cross Ref Valuing Customer Portfolios with Endogenous Mass and Direct Marketing Interventions Using a Stochastic Dynamic Programming DecompositionMarketing Science, Vol. 33, No. 5 Cross Ref Divergence Behaviour of the Successive Geometric Mean Method of Pairwise Comparison Matrix Generation for a Multiple Stage, Multiple Objective Optimization Problem20 December 2013 | Journal of Multi-Criteria Decision Analysis, Vol. 21, No. 3-4 Cross Ref Solving multichain stochastic games with mean payoff by policy iteration Cross Ref Discounting axioms imply risk neutrality8 February 2012 | Annals of Operations Research, Vol. 208, No. 1 Cross Ref (Approximate) iterated successive approximations algorithm for sequential decision processes8 February 2012 | Annals of Operations Research, Vol. 208, No. 1 Cross Ref The multi-armed bandit, with constraints13 November 2012 | Annals of Operations Research, Vol. 208, No. 1 Cross Ref Persistently Optimal Policies in Stochastic Dynamic Programming with Generalized DiscountingMathematics of Operations Research, Vol. 38, No. 1 Cross Ref A Dynamic Game of Reputation and Economic Performances in Nondemocratic Regimes15 June 2012 | Dynamic Games and Applications, Vol. 2, No. 4 Cross Ref Stochastic mutual induction computing in Het-CoMP empowered cellular networks Cross Ref SWITCHING AND SEQUENCING AVAILABLE THERAPIES SO AS TO MAXIMIZE A PATIENT'S EXPECTED TOTAL LIFETIME16 May 2012 | International Journal of Biomathematics, Vol. 05, No. 04 Cross Ref Multigrid methods for two-player zero-sum stochastic games17 January 2012 | Numerical Linear Algebra with Applications, Vol. 19, No. 2 Cross Ref Cooperative Access Class Barring for Machine-to-Machine CommunicationsIEEE Transactions on Wireless Communications, Vol. 11, No. 1 Cross Ref PARTIALLY OBSERVABLE MARKOV DECISION PROCESSES AND PERIODIC POLICIES WITH APPLICATIONS30 April 2012 | International Journal of Information Technology & Decision Making, Vol. 10, No. 06 Cross Ref Approximate policy iteration: a survey and some new methods19 July 2011 | Journal of Control Theory and Applications, Vol. 9, No. 3 Cross Ref Total Expected Discounted Reward MDPS: Existence of Optimal Policies15 February 2011 Cross Ref Stationary policies with Markov partition propertyJournal of Statistics and Management Systems, Vol. 13, No. 6 Cross Ref Myopic Solutions of Homogeneous Sequential Decision ProcessesOperations Research, Vol. 58, No. 4-part-2 Cross Ref Partially observable Markov decision model for the treatment of early Prostate Cancer13 October 2010 | OPSEARCH, Vol. 47, No. 2 Cross Ref Computable Markov-perfect industry dynamicsThe RAND Journal of Economics, Vol. 41, No. 2 Cross Ref Dynamic Allocation of Scarce Resources Under Supply UncertaintySSRN Electronic Journal Cross Ref Economically Efficient Constitutional GovernanceSSRN Electronic Journal Cross Ref Applications of Metric Coinduction16 September 2009 | Logical Methods in Computer Science, Vol. 5, No. 3 Cross Ref Probabilistic models for optimizing patients survival ratesJournal of Interdisciplinary Mathematics, Vol. 11, No. 5 Cross Ref A multi-period TSP with stochastic regular and urgent demandsEuropean Journal of Operational Research, Vol. 185, No. 1 Cross Ref Four Canadian Contributions to Stochastic Modeling18 January 2017 | INFOR: Information Systems and Operational Research, Vol. 46, No. 1 Cross Ref Dynamic Programming5 December 2016 Cross Ref Financial intermediary's choice of borrowingApplied Economics, Vol. 40, No. 2 Cross Ref Optimal prepayment behaviourApplied Economics Letters, Vol. 14, No. 15 Cross Ref A structured pattern matrix algorithm for multichain Markov decision processes6 February 2007 | Mathematical Methods of Operations Research, Vol. 66, No. 3 Cross Ref Incomplete markets, labor supply and capital accumulationJournal of Monetary Economics, Vol. 54, No. 8 Cross Ref VARIATIONS ON THE THEME OF CONNING IN MATHEMATICAL ECONOMICSJournal of Economic Surveys, Vol. 21, No. 3 Cross Ref Commercial loan borrower’s optimal borrowing and prepayment decisions under uncertaintyApplied Economics, Vol. 39, No. 8 Cross Ref Risk-Sensitive and Risk-Neutral Multiarmed BanditsMathematics of Operations Research, Vol. 32, No. 2 Cross Ref Computable Markov-Perfect Industry Dynamics: Existence, Purification, and MultiplicitySSRN Electronic Journal Cross Ref Semi-Markov information model for revenue management and dynamic pricing9 March 2006 | OR Spectrum, Vol. 29, No. 1 Cross Ref A Turnpike Theorem For A Risk-Sensitive Markov Decision Process with StoppingEric V. Denardo and Uriel G. Rothblum26 July 2006 | SIAM Journal on Control and Optimization, Vol. 45, No. 2AbstractPDF (189 KB)Discounting and Risk NeutralitySSRN Electronic Journal Cross Ref Myopic Solutions of Homogeneous Sequential Decision ProcessesSSRN Electronic Journal Cross Ref Limited Attention as a Bounded on RationalitySSRN Electronic Journal Cross Ref Approximation solution and suboptimality for discounted semi-markov decision problems with countable state spaceOptimization, Vol. 53, No. 4 Cross Ref Optimal threshold probability in undiscounted Markov decision processes with a target setApplied Mathematics and Computation, Vol. 149, No. 2 Cross Ref Index Policies for Stochastic Search in a Forest with an Application to R&D Project ManagementMathematics of Operations Research, Vol. 29, No. 1 Cross Ref Recursive methods in probability control Cross Ref Optimism and overconfidence in searchReview of Economic Dynamics, Vol. 7, No. 1 Cross Ref Nonclassical Brock-Mirman EconomiesSSRN Electronic Journal Cross Ref Optimal policies in continuous time inventory control models with limited supplyComputers & Mathematics with Applications, Vol. 46, No. 7 Cross Ref Existence and Uniqueness of Solutions to the Bellman Equation in the Unbounded CaseEconometrica, Vol. 71, No. 5 Cross Ref Dynamic Airline Revenue Management with Multiple Semi-Markov DemandOperations Research, Vol. 51, No. 1 Cross Ref Finite State and Action MDPS Cross Ref Dynamic Programming Cross Ref Incomplete Markets, Labor Supply and Capital AccumulationSSRN Electronic Journal Cross Ref Overconfidence in SearchSSRN Electronic Journal Cross Ref Constrained Discounted Semi-Markov Decision Processes Cross Ref Controlled Markov Chains with Utility Functions Cross Ref Total Reward Criteria Cross Ref Is There a Curse of Dimensionality for Contraction Fixed Points in the Worst Case?Econometrica, Vol. 70, No. 1 Cross Ref SET-VALUED CONTROL LAWS IN TEV-DC CONTROL PROBLEMSIFAC Proceedings Volumes, Vol. 35, No. 1 Cross Ref Dynamic economic management of soil erosion, nutrient depletion, and productivity in the north central USA1 January 2001 | Land Degradation & Development, Vol. 12, No. 4 Cross Ref On Markov Policies for Minimax Decision ProcessesJournal of Mathematical Analysis and Applications, Vol. 253, No. 1 Cross Ref Recursive method in stochastic optimization under compound criteria Cross Ref Kulatilaka '93: The Case of a Dual Fuel Boiler: A Review, Gauss Codes and Numerical ExamplesSSRN Electronic Journal Cross Ref Kulatilaka '88 as a CVP Analysis in a Real Option Framework: A Review, Gauss Codes and Numerical ExamplesSSRN Electronic Journal Cross Ref A stochastic programming approach to manufacturing flow controlIIE Transactions, Vol. 32, No. 10 Cross Ref Chapter 5 Numerical solution of dynamic economic models Cross Ref A Theory of Constitutional Standards and Civil LibertySSRN Electronic Journal Cross Ref The one-sector growth model with idiosyncratic shocks: Steady states and dynamicsJournal of Monetary Economics, Vol. 39, No. 3 Cross Ref Pansystems optimization, generalized principles of optimality, and fundamental equations of dynamic programmingKybernetes, Vol. 26, No. 3 Cross Ref Introduction Cross Ref Stochastic Inventory Models with Limited Production Capacity and Periodically Varying Parameters27 July 2009 | Probability in the Engineering and Informational Sciences, Vol. 11, No. 1 Cross Ref A Comparison of Policy Iteration Methods for Solving Continuous-State, Infinite-Horizon Markovian Decision Problems Using Random, Quasi-random, and Deterministic DiscretizationsSSRN Electronic Journal Cross Ref On the value function in constrained control of Markov chainsMathematical Methods of Operations Research, Vol. 44, No. 3 Cross Ref Models for capacity acquisition decisions Journal of Systems, Vol. No. 3 Cross Ref capital and in of & Vol. No. 2 Cross Ref approximations for the control of a Journal of Operational Research, Vol. No. 1 Cross Ref Chapter 14 Numerical dynamic programming in Cross Ref The for Vol. 38, No. 1 Cross Ref A model of with limited Theory, Vol. 5, No. 1 Cross Ref control under Transactions on Automatic Control, Vol. 40, No. 2 Cross Ref discounted and undiscounted Markov Decision Problems June Cross Ref Game Models of Management Cross Ref Learning to dynamic Vol. No. Cross Ref AND Economic Vol. 33, No. Cross Ref May Cross Ref of linear programming for and Markovian control Methods and Models of Operations Research, Vol. 40, No. 1 Cross Ref and of equilibria in stochastic of Economic and Control, Vol. No. 2 Cross Ref optimal control of Journal of Operational Research, Vol. No. 2 Cross Ref Chapter of decision processes Cross Ref A generalized of the Theory, Vol. No. 1 Cross Ref Some structured dynamic in & Mathematics with Applications, Vol. No. Cross Ref Policy iteration and methods for Markov decision processes under average & Mathematics with Applications, Vol. No. Cross Ref Optimal control of a facility with of Optimization Theory and Applications, Vol. No. 3 Cross Ref A of in Management Cross Ref approach to dynamic of Mathematical Economics, Vol. 21, No. 1 Cross Ref Turnpike for a of in manufacturing flow of Operations Research, Vol. 29, No. 1 Cross Ref Optimal of a with Journal of Operational Research, Vol. No. 2 Cross Ref Dynamic programming and for of Mathematical Analysis and Applications, Vol. No. 1 Cross Ref optimal algorithm for stochastic Transactions on Automatic Control, Vol. No. 8 Cross Ref A algorithm for Control Applications and Vol. 12, No. 1 Cross Ref Deterministic and Games with Cross Ref for Stochastic Games Cross Ref Markovian Decision J. July 2006 | SIAM Journal on Control and Optimization, Vol. No. and in and Production Economics, Vol. 19, No. Cross Ref Recursive and the of Economic Theory, Vol. No. 2 Cross Ref Fixed for of of Mathematical Analysis and Applications, Vol. No. 1 Cross Ref Chapter 8 Markov decision processes Cross Ref Optimal Policies and the of April 2012 | The Journal of Vol. 44, No. 5 Cross Ref Controlled semi-markov models the discounted of and Vol. 21, No. 3 Cross Ref A for the solution of time horizon decision Stochastic Models and Analysis, Vol. 4, No. 4 Cross Ref under and state Research Vol. 35, No. 5 Cross Ref Sequential equilibria in two-person of Optimization Theory and Applications, Vol. No. 1 Cross Ref of Discrete Control August 2006 | SIAM Journal on Control and Optimization, Vol. 26, No. of linear programming to discounted Markovian decision Vol. 10, No. 3 Cross Ref Contraction undiscounted Markov decision of Mathematical Analysis and Applications, Vol. No. 1 Cross Ref Solving Markovian decision processes by successive of of Mathematical Analysis and Applications, Vol. No. 2 Cross Ref The of and A Analysis Cross Ref On the Existence of Sequential in Markov Games Cross Ref Optimality for continuous time with Markov Application to an planning January 2006 Cross Ref Applications of methods to and Vol. 51, No. 6 Cross Ref On dynamic strategies in horizon models the of Economic & Vol. No. 3 Cross Ref Abstract Dynamic Programming Models under and H. July 2006 | SIAM Journal on Control and Optimization, Vol. No. A stochastic control October 2007 | Optimal Control Applications and Vol. No. 3 Cross Ref for dynamic programming with of Optimization Theory and Applications, Vol. 54, No. 1 Cross Ref on the of a of Mathematical Analysis and Applications, Vol. No. 2 Cross Ref on the of a Finite Markov July 2009 | Probability in the Engineering and Informational Sciences, Vol. No. 1 Cross Ref Dynamic Programming and Markov Decision November 2016 Cross Ref Optimal policies for in Stochastic Vol. No. 2 Cross Ref On the of capital of Economic Theory, Vol. 40, No. 1 Cross Ref in Markov decision of Mathematical Analysis and Applications, Vol. No. 2 Cross Ref Fixed for discounted finite decision of Mathematical Analysis and Applications, Vol. No. 2 Cross Ref Some new mathematical methods in dynamic programming over Vol. 9, No. 1 Cross Ref Approximation and in dynamic Transactions on Automatic Control, Vol. No. 3 Cross Ref On the of in Discounted Stochastic Dynamic Games Cross Ref Optimal decisions over time and an by the Bellman Vol. 7, No. Cross Ref Reward for Markov decision processes Cross Ref MARKOV DECISION Vol. 39, No. 2 Cross Ref optimal policies in inventory models with continuous July 2016 | in Applied Vol. No. 2 Cross Ref Finite state for average state Markov decision March | Vol. 7, No. 1 Cross Ref for a discounted Markov decision Processes and Applications, Vol. 19, No. 1 Cross Ref A survey on in Vol. No. 2 Cross Ref A Fixed to Markov and P. J. July 2006 | SIAM Journal on Discrete Vol. 5, No. policy iteration Research Letters, Vol. No. 5 Cross Ref Stochastic Production with Production S. P. R. and N. August 2006 | SIAM Journal on Control and Optimization, Vol. No. policies in dynamic programming: Linear programming suboptimality and Programming, Vol. No. 1 Cross Ref Optimal and control of with Research Logistics Vol. No. 2 Cross Ref Dynamic I. February 2012 | SIAM Journal on Control and Optimization, Vol. 21, No. 3AbstractPDF Optimal Control of Partially Semi-Markov Processes the Infinite Discounted Cross Ref of observable Markov decision processes linear of Mathematical Analysis and Applications, Vol. No. 1 Cross Ref A of & Letters, Vol. 12, No. 3-4 Cross Ref Optimization of networks Markov Operations Research, Vol. 26, No. 1 Cross Ref The of discounted Markov decision July 2016 | Journal of Applied Vol. 19, No. 04 Cross Ref The of discounted Markov decision July 2016 | Journal of Applied Vol. 19, No. 4 Cross Ref the in with de de Vol. 33, No. 3 Cross Ref A of inventory of Mathematical Analysis and Applications, Vol. No. 1 Cross Ref for the structure of optimal strategies in dynamic of Optimization Theory and Applications, Vol. No. 3 Cross Ref Finite state approximations for state horizon discounted Markov decision processes with of Mathematical Analysis and Applications, Vol. No. 1 Cross Ref Solving equations by March 2011 | Operations Research, Vol. No. 2 Cross Ref the discounted return in and semi-markov Research Logistics Vol. No. 4 Cross Ref optimal policies for Research Logistics Vol. No. 3 Cross Ref Optimal control of Research Logistics Vol. No. 3 Cross Ref A of the of Mathematical Analysis and Applications, Vol. No. 1 Cross Ref replacement with a Markovian July 2016 | Journal of Applied Vol. No. Cross Ref replacement with a Markovian July 2016 | Journal of Applied Vol. No. 3 Cross Ref optimal policies for structured Markov decision Journal of Operational Research, Vol. 7, No. 4 Cross Ref Markov decision problems with of Optimization Theory and Applications, Vol. No. 2 Cross Ref stopping July 2016 | Journal of Applied Vol. No. Cross Ref stopping July 2016 | Journal of Applied Vol. No. 2 Cross Ref On the convergence of successive approximations in dynamic programming with Operations Research, Vol. No. 3 Cross Ref and in generalized Research Logistics Vol. No. 1 Cross Ref Optimality in and linear Programming, Vol. No. 1 Cross Ref Economic of of Economics and Vol. 7, No. 4 Cross Ref Optimal sequential and resource under July 2016 | in Applied Vol. 12, No. 04 Cross Ref Stochastic optimal The time Transactions on Automatic Control, Vol. No. 6 Cross Ref Optimal sequential and resource under July 2016 | in Applied Vol. 12, No. 4 Cross Ref Improved of the discounted return in Markov and Operations Research, Vol. No. 5 Cross Ref Discounted Stochastic R. and P. July 2006 | SIAM Journal on Discrete Vol. No. 2AbstractPDF KB)Optimal policies for the Research Logistics Vol. 27, No. 1 Cross Ref Optimal policies for Research Logistics Vol. 27, No. 1 Cross Ref approximations for discounted Markov decision of Mathematical Analysis and Applications, Vol. No. 1 Cross Ref and Approximation of Sequential July 2006 | SIAM Journal on Control and Optimization, Vol. No. June 2007 | Optimization, Vol. 11, No. 1 Cross Ref A method of for discounted Markov decision Operations Research, Vol. No. 7 Cross Ref A for approximations of Markov of Mathematical Analysis and Applications, Vol. 71, No. 1 Cross Ref Steady State Policies for Deterministic Dynamic July 2006 | SIAM Journal on Applied Mathematics, Vol. No. KB)Optimal policies for a of stochastic Research Logistics Vol. 26, No. 2 Cross Ref in a Generalized Markov Decision J. July 2006 | SIAM Journal on Control and Optimization, Vol. No. 2AbstractPDF and for Markov decision problems with the May | Vol. No. 1 Cross Ref Geometric convergence of in multichain Markov decision July 2016 | in Applied Vol. 11, No. Cross Ref Geometric convergence of in multichain Markov decision July 2016 | in Applied Vol. 11, No. 1 Cross Ref A survey of for some of Markov decision problems Cross Ref Successive approximations for Markov decision processes and Markov games with Optimization, Vol. 10, No. 3 Cross Ref Markov decision processes and Processes and Applications, Vol. No. 1 Cross Ref Contraction undiscounted Markov decision of Mathematical Analysis and Applications, Vol. No. 3 Cross Ref A Stochastic Game of a July 2006 | SIAM Journal on Control and Optimization, Vol. No. 3AbstractPDF KB)Optimal policies for a replacement and Control, Vol. No. 1 Cross Ref A zero-sum stochastic game model of Journal of Game Theory, Vol. 7, No. 1 Cross Ref AND IN MARKOV DECISION A Cross Ref THE OF by Cross Ref DYNAMIC PROGRAMMING IN by Cross Ref DYNAMIC by Cross Ref ON OF DYNAMIC Cross Ref OF DYNAMIC PROGRAMMING Cross Ref Cross Ref Successive for stochastic Cross Ref Mappings with Application in Dynamic ProgrammingDimitri P. July 2006 | SIAM Journal on Control and Optimization, Vol. No. 3AbstractPDF programming by successive approximations with to of Mathematical Analysis and Applications, Vol. 58, No. 2 Cross Ref On the Optimality of Policies in Decision and M. and L. July 2006 | SIAM Journal on Applied Mathematics, Vol. 32, No. 2AbstractPDF and Markov Programming Cross Ref Theory and Markovian Decision Chains Cross Ref A of successive methods for discounted Markovian decision Operations Research, Vol. No. 5 Cross Ref The on optimal of in labor in the of Economic Theory, Vol. 13, No. 1 Cross Ref On for successive Transactions on Automatic Control, Vol. 21, No. 3 Cross Ref The on optimal of in labor in the May Cross Ref Cross Ref A of the of the in Dynamic July 2007 | A Transactions, Vol. No. 1 Cross Ref of in dynamic Transactions on Automatic Control, Vol. No. 3 Cross Ref of some of dynamic and Control, Vol. 27, No. 4 Cross Ref and finite state Markovian decision of Mathematical Analysis and Applications, Vol. 49, No. 3 Cross Ref Discounted decision linear programming and policy Vol. 29, No. 1 Cross Ref On the of Dynamic Programming Cross Ref Introduction to Dynamic from E. V. Denardo and L. G. Mitten, of Sequential Decision Journal of Engineering Cross Ref in Operations Research, Vol. No. 3 Cross Ref Optimal capital under of Economic Theory, Vol. No. 2 Cross Ref of optimization problems and decision of Computer and Sciences, Vol. No. 1 Cross Ref A Class of Markovian Decision Processes Cross Ref Optimal Control of Queueing Systems Cross Ref of dynamic of Mathematical Analysis and Applications, Vol. 43, No. 3 Cross Ref stochastic July 2016 | Journal of Applied Vol. 10, No. Cross Ref stochastic July 2016 | Journal of Applied Vol. 10, No. 3 Cross Ref Optimal policies for a in to stochastic Research Logistics Vol. No. 2 Cross Ref information and decision and Vol. 9, No. 5 Cross Ref decision Research Logistics Vol. No. 1 Cross Ref dynamic of Optimization Theory and Applications, Vol. 11, No. 3 Cross Ref of a Markovian decision problem by successive Operations Research, Vol. No. 1 Cross Ref the of in Vol. No. 1 Cross Ref for optimization and Control, Vol. 21, No. 5 Cross Ref approximations to dynamic of Mathematical Analysis and Applications, Vol. No. 3 Cross Ref On a of optimal policies in continuous time Markovian decision of Mathematical Analysis and Applications, Vol. No. 1 Cross Ref Applications of Metric Cross Ref games Cross Ref Dynamic programming of stochastic networks with Cross Ref and in decision processes Cross Ref optimal algorithm for continuous state time stochastic control Cross Ref games Cross Ref approach to optimization of discounted stochastic continuous-time Cross Ref Finite state continuous time Markov decision processes with an planning of Mathematical Analysis and Applications, Vol. No. 3 Cross Ref Markov V. Denardo and B. L. July 2006 | SIAM Journal on Applied Mathematics, Vol. No. 3AbstractPDF of Stationary Optimal Policies for Some Markov July 2006 | SIAM Review, Vol. 9, No. 3AbstractPDF Programming by Linear August 2006 | SIAM Journal on Applied Mathematics, Vol. 14, No. 9, August July 2006 for and Applied & for and Applied Mathematics
- Research Article
2
- 10.14279/depositonce-2661
- Dec 15, 2010
- DepositOnce
Providing realistic performance indicators of online algorithms for a given online optimization problem is a difficult task in general. Due to significant drawbacks of other concepts like competitive analysis, Markov decision problems (MDPs) may yield an attractive alternative whenever reasonable stochastic information about future requests is available. However, the number of states in MDPs emerging from real applications is usually exponential in the original input parameters. Therefore, the standard methods for analyzing policies, i.e., online algorithms in our context, are infeasible. In this thesis we propose a new computational tool to evaluate the behavior of policies for discounted MDPs locally, i.e., depending on a particular initial state. The method is based on a column generation algorithm for approximating the total expected discounted cost of an unknown optimal policy, a concrete policy, or a single action (which assumes actions at other states to be made according to an optimal policy). The algorithm determines an $\varepsilon$-approximation by inspecting only relatively small local parts of the total state space. We prove that the number of states required for providing the approximation is independent of the total number of states, which underlines the practicability of the algorithm. The approximations obtained by our algorithm are typically much better than the theoretical bounds obtained by other approaches. We investigate the pricing problem and the structure of the linear programs encountered in the column generation. Moreover, we propose and analyze different extensions of the basic algorithm in order to achieve good approximations fast. The potential of our analysis tool is exemplified for discounted MDPs emerging from different online optimization problems, namely online bin coloring, online target date assignment, and online elevator control. The results of the experiments are quite encouraging: our method is mostly capable to provide performance indicators for online algorithms that much better reflect observations made in simulations than competitive analysis does. Moreover, the analysis allows to reveal weaknesses of the considered online algorithms. This way, we developed a new online algorithm for the online bin coloring problem that outperforms existing ones in our analyses and simulations.
- Research Article
- 10.2139/ssrn.3579795
- Apr 18, 2020
- SSRN Electronic Journal
Most important economic decision problems are sequential, and thus naturally represented as Markov Decision Problems (MDP). After reviewing the theory of MDPs, the applicability of MDPs to real-life sequential decisions appears impractical. The central question addressed in this essay is how ordinary humans behave in the real-life sequential decision problems they face. A formal behavioral approach is presented, and it also appears impractical for all but toy MDPs. After engaging in introspection, a key insight is the enormous extent to which information and reinforcement provided by parents, teachers and others shapes our behavior in real-life. With these insights, an integrated framework of dynamic behavior emerges in which genetic evolution, short-term reinforcement learning and long-term acquisition of knowledge via institutions are seen as important aspects.
- Research Article
- 10.1609/icaps.v35i1.36095
- Sep 16, 2025
- Proceedings of the International Conference on Automated Planning and Scheduling
Popular algorithms to solve Markov Decision Problems (MDPs) include policy iteration and the Simplex method (executed on an induced linear program). Each run of these algorithms can be associated with a sequence of "locally-improving" policies for the input MDP. For integers n >= 2, k >= 2, let f(n, k) denote the longest possible sequence of locally-improving policies for any MDP with n states and k actions per state. An alternative view of f(n, k) is as a descriptive structural property of the policy space of MDPs: it is the largest possible "c-height" in an induced "LP-digraph" of any n-state, k-action MDP. How large can f(n, k) be? A trivial upper bound on f(n, k) is the total number of (Markovian, deterministic) policies, which is k^{n}. A construction from Melekopoglou and Condon (1994) shows that f(n, 2) = 2^{n}, implying that the trivial upper bound is tight for k = 2. For k >= 3, the tightest lower bound on f(n, k) in the current literature is only Omega(k^{n / 2}) (Ashutosh et al., 2020). In this paper, we propose a family of MDPs to show a lower bound of Omega( (floor(k / 2) )^{n}) on f(n, k)---giving a exponential-in-n tightening for each k >= 6. Our investigation brings out technical challenges that do not arise for k = 2. Our result still leaves open the important question of whether f(n, k) is indeed k^{n} for n >= 2, k >= 2. We furnish an affirmative answer for the special case of n = 2, k >= 2.
- Conference Article
2
- 10.24963/ijcai.2017/248
- Aug 1, 2017
The Markov Decision Problem (MDP) plays a central role in AI as an abstraction of sequential decision making. We contribute to the theoretical analysis of MDP PLANNING, which is the problem of computing an optimal policy for a given MDP. Specifically, we furnish improved STRONG WORST-CASE upper bounds on the running time of MDP planning. Strong bounds are those that depend only on the number of states n and the number of actions k in the specified MDP; they have no dependence on affiliated variables such as the discount factor and the number of bits needed to represent the MDP. Worst-case bounds apply to EVERY run of an algorithm; randomised algorithms can typically yield faster EXPECTED running times. While the special case of 2-action MDPs (that is, k = 2) has recently received some attention, bounds for general k have remained to be improved for several decades. Our contributions are to this general case. For k >= 3, the tightest strong upper bound shown to date for MDP planning belongs to a family of algorithms called Policy Iteration. This bound is only a polynomial improvement over a trivial bound of poly(n, k) k^{n} [Mansour and Singh, 1999]. In this paper, we generalise a contrasting algorithm called the Fibonacci Seesaw, and derive a bound of poly(n, k) k^{0.6834n}. The key construct we use is a template to map algorithms for the 2-action setting to the general setting. Interestingly, this idea can also be used to design Policy Iteration algorithms with a running time upper bound of poly(n, k) k^{0.7207n}. Both our results improve upon bounds that have stood for several decades.
- Research Article
68
- 10.1287/opre.15.3.559
- Jun 1, 1967
- Operations Research
In a Markovian decision problem, choice of an action determines an immediate return and the probability of moving to the next state. It is desired to maximize the expected total of discounted future returns. If upper and lower bounds on the optimal expected return are available, a simple test is described that may show that certain actions are suboptimal, permanently eliminating them from further consideration. This test may be incorporated into the dynamic programming routine for solving the decision problem. This was tried on Howard's automobile replacement problem, using the upper and lower bounds described in “A Modified Dynamic Programming Method” (J. Math. Anal. and Appl. 14, April, 1966). The amount of computation required by the dynamic programming routine was reduced, conservatively, by 75 per cent.
- Book Chapter
3
- 10.1007/978-1-4899-7491-4_6
- Aug 7, 2014
This chapter focuses on a problem of control optimization, in particular the Markov decision problem (or process). Our discussions will be at a very elementary level, and we will not attempt to prove any theorems. The central aim of this chapter is to introduce the reader to classical dynamic programming in the context of solving Markov decision problems. In the next chapter, the same ideas will be presented in the context of simulation-based dynamic programming. The main concepts presented in this chapter are (1) Markov chains, (2) Markov decision problems, (3) semi-Markov decision problems, and (4) classical dynamic programming methods.
- Conference Article
7
- 10.1109/gamenets.2014.7043725
- Nov 1, 2014
Cell handover has been considered as one of the most challenging issues in LTE-A macro-femtocell networks, due to the ad hoc deployment nature of Femto Base Stations (FBSs). In this paper, our goal is to achieve seamless handover of Mobile Terminal (MT) among different cells while improving the system throughput. First, we formulate the handover decision and channel allocation problem as a Markov Decision Problem (MDP) whose objective is to maximize the total discounted expected reward per connection over an inOilite horizon. The MDP formulation takes various system parameters into account such as MT moving velocity, MT buffer size, cell switching cost and call dropping penalty. Then, we propose an algorithm based on Q-Iearning techniques to obtain the optimal cell handover and channel allocation policy. Simulation results show that our proposed algorithm is effective, and can maximize the expected total reward and reduce the average number of handovers.
- Book Chapter
- 10.1007/978-1-4899-7491-4_11
- Aug 7, 2014
This chapter will discuss the proofs of optimality of a subset of algorithms discussed in the context of control optimization. The chapter is organized as follows. We begin in Sect. 2 with some definitions and notation related to discounted and average reward Markov decision problems (MDPs). Subsequently, we present convergence theory related to dynamic programming (DP) for MDPs in Sects. 3 and 4. In Sect. 5, we discuss some selected topics related to semi-MDPs (SMDPs). Thereafter, from Sect. 6, we present a selected collection of topics related to convergence of reinforcement learning (RL) algorithms.KeywordsAverage RewardMarkov Decision Problems (MDPs)Bellman Optimality EquationRelative Value IterationSlower IterateThese keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
- Research Article
19
- 10.1007/s00500-010-0581-3
- Mar 28, 2010
- Soft Computing
As an important approach to solving complex sequential decision problems, reinforcement learning (RL) has been widely studied in the community of artificial intelligence and machine learning. However, the generalization ability of RL is still an open problem and it is difficult for existing RL algorithms to solve Markov decision problems (MDPs) with both continuous state and action spaces. In this paper, a novel RL approach with fast policy search and adaptive basis function selection, which is called Continuous-action Approximate Policy Iteration (CAPI), is proposed for RL in MDPs with both continuous state and action spaces. In CAPI, based on the value functions estimated by temporal-difference learning, a fast policy search technique is suggested to search for optimal actions in continuous spaces, which is computationally efficient and easy to implement. To improve the generalization ability and learning efficiency of CAPI, two adaptive basis function selection methods are developed so that sparse approximation of value functions can be obtained efficiently both for linear function approximators and kernel machines. Simulation results on benchmark learning control tasks with continuous state and action spaces show that the proposed approach not only can converge to a near-optimal policy in a few iterations but also can obtain comparable or even better performance than Sarsa-learning, and previous approximate policy iteration methods such as LSPI and KLSPI.
- Research Article
- 10.2139/ssrn.2785592
- Sep 17, 2019
- SSRN Electronic Journal
The paper considers a class of decision problems with in_nite time horizon that contains Markov decision problems as an important special case. Our interest concerns the case where the decision maker cannot commit himself to his future action choices. We model the decision maker as consisting of multiple selves, where each history of the decision problem corresponds to one self. Each self is assumed to have the same utility function as the decision maker. We introduce the notions of Nash equilibrium, subgame perfect equilibrium, and curb sets for decision problems. An optimal policy at the initial history is a Nash equilibrium but not vice versa. Both subgame perfect equilibria and curb sets are equivalent to subgame optimal policies. The concept of a subgame optimal policy is therefore robust to the absence of commitment technologies.
- Research Article
19
- 10.1016/j.knosys.2022.108221
- Jan 25, 2022
- Knowledge-Based Systems
Transfer reinforcement learning via meta-knowledge extraction using auto-pruned decision trees
- Conference Article
19
- 10.1109/vppc.2014.7007115
- Oct 1, 2014
This paper proposes the application of the Markov decision problem (MDP) framework for optimizing the autonomous charging of individual plug-in electric vehicles (EVs). Two infinite horizon average cost MDP formulations are described, one for plug-in hybrid electric vehicles (PHEVs) and one for battery only electric vehicles (BEVs). In both formulations, we assume no direct input from the driver to the smart charger about the driver's travel schedule. Instead, we use stochastic models of plug-in and unplug behaviors as well as energy required for transportation to represent a driver's charging requirements. We also assume that electric energy prices follow a Markov random process. These stochastic models can be built from historical data on vehicle usage. The objective of the MDPs is to minimize the sum of electric energy charging costs, driving costs, and the cost of any driver inconvenience. We demonstrate the solution of the MDPs with assumed parameter values and analyze the results. This work presents a new approach to minimizing EV charging costs while reducing the need for trip planning by a driver.
- Book Chapter
8
- 10.1007/978-3-319-47766-4_9
- Jan 1, 2017
This chapter considers the ambulance dispatch problem, in which one must decide which ambulance to send to an incident in real time. In practice as well as in literature, it is commonly believed that the closest idle ambulance is the best choice. This chapter describes alternatives to the classical closest idle ambulance rule. Our first method is based on a Markov decision problem (MDP), which constitutes the first known MDP model for ambulance dispatching. Moreover, in the broader field of dynamic ambulance management, this is the first MDP that captures more than just the number of idle vehicles, while remaining computationally tractable for reasonably-sized ambulance fleets. We analyze the policy obtained from this MDP, and transform it to a heuristic for ambulance dispatching that can handle the real-time situation more accurately than our MDP states can describe. We evaluate our policies by simulating a realistic emergency medical services region in the Netherlands. For this region, we show that our heuristic reduces the fraction of late arrivals by 13% compared to the “closest idle” benchmark policy. This result sheds new light on the popular belief that deviating from the closest idle dispatch policy cannot greatly improve the objective.
- Conference Article
12
- 10.1109/isic.2001.971476
- Sep 5, 2001
A number of well known methods exist for solving Markov decision problems (MDP) involving a single decision-maker with or without model uncertainty. Recently, there has been great interest in the multi-agent version of the problem where there are multiple interacting decision makers. However, most of the suggested methods for multi-agent MDPs require complete knowledge concerning the state and action of all agents. This, in turn, results in a large communication overhead when the agents are physically distributed. In this paper, we address the problem of coping with uncertainty regarding the agent states and action with different amounts of communication. In particular, assuming a known model and common reward structure, hidden Markov models and techniques for partially observed MDPs are combined to estimate the states or actions (or both) of other agents. Simulation results are presented to compare the performances that can be realized under different assumptions on agent communications.