Stochastic Processes with Expected Stopping Time
Markov chains are the de facto finite-state model for stochastic dynamical systems, and Markov decision processes (MDPs) extend Markov chains by incorporating non-deterministic behaviors. Given an MDP and rewards on states, a classical optimization criterion is the maximal expected total reward where the MDP stops after T steps, which can be computed by a simple dynamic programming algorithm. We consider a natural generalization of the problem where the stopping times can be chosen according to a probability distribution, such that the expected stopping time is T, to optimize the expected total reward. Quite surprisingly we establish inter-reducibility of the expected stopping-time problem for Markov chains with the Positivity problem (which is related to the well-known Skolem problem), for which establishing either decidability or undecidability would be a major breakthrough. Given the hardness of the exact problem, we consider the approximate version of the problem: we show that it can be solved in exponential time for Markov chains and in exponential space for MDPs.
- Conference Article
3
- 10.1109/lics52264.2021.9470595
- Jun 29, 2021
Markov chains are the de facto finite-state model for stochastic dynamical systems, and Markov decision processes (MDPs) extend Markov chains by incorporating non-deterministic behaviors. Given an MDP and rewards on states, a classical optimization criterion is the maximal expected total reward where the MDP stops after T steps, which can be computed by a simple dynamic programming algorithm. We consider a natural generalization of the problem where the stopping times can be chosen according to a probability distribution, such that the expected stopping time is T, to optimize the expected total reward. Quite surprisingly we establish inter-reducibility of the expected stopping-time problem for Markov chains with the Positivity problem (which is related to the well-known Skolem problem), for which establishing either decidability or undecidability would be a major breakthrough. Given the hardness of the exact problem, we consider the approximate version of the problem: we show that it can be solved in exponential time for Markov chains and in exponential space for MDPs.
- Research Article
482
- 10.1137/1009030
- Apr 1, 1967
- SIAM Review
Next article Contraction Mappings in the Theory Underlying Dynamic ProgrammingEric V. DenardoEric V. Denardohttps://doi.org/10.1137/1009030PDFBibTexSections ToolsAdd to favoritesExport CitationTrack CitationsEmail SectionsAbout[1] Richard Bellman, Dynamic programming, Princeton Univeristy Press, Princeton, N. J., 1957xxv+342 MR0090477 Google Scholar[2] David Blackwell, Discrete dynamic programming, Ann. Math. Statist., 33 (1962), 719–726 MR0149965 0133.12906 CrossrefISIGoogle Scholar[3] David Blackwell, Discounted dynamic programming, Ann. Math. Statist., 36 (1965), 226–235 MR0173536 0133.42805 CrossrefGoogle Scholar[4] A. Charnes and , R. G. Schroeder, On some tactical antisubmarine games, Systems Research Memorandum No. 131, The Technological Institute, Northwestern University, Evanston, Illinois, 1965 Google Scholar[5] E. V. Denardo, Masters Thesis, Sequential decision processes, Doctoral thesis, Northwestern University, Evanston, Illinois, 1965 Google Scholar[6] F. D'Epenoux, Sur un problème de production de stockage dans l'aleatoire, Rev. Française Recherche Operationelle, 14 (1960), 3–16 Google Scholar[7] Cyrus Derman, On sequential decisions and Markov chains, Management Sci., 9 (1962/1963), 16–24 MR0169685 0995.90621 CrossrefISIGoogle Scholar[8] Cyrus Derman and , Morton Klein, Some remarks on finite horizon Markovian decision models, Operations Res., 13 (1965), 272–278 MR0175636 0137.13901 CrossrefISIGoogle Scholar[9] J. H. Eaton and , L. A. Zadeh, Optimal pursuit strategies in discrete-state probabilistic systems, Trans. ASME Ser. D. J. Basic Engrg., 84 (1962), 23–29 MR0153510 CrossrefGoogle Scholar[10] L. È. Èlsgol'c, Qualitative methods in mathematical analysis, Translations of Mathematical Monographs, Vol. 12, American Mathematical Society, Providence, R.I., 1964vii+250, Trans. by A. A. Brown and J. M. Danskin MR0170048 0133.37102 CrossrefGoogle Scholar[11] B. Fox, Age replacement with discounting, Operations Res., to appear Google Scholar[12] Ronald A. Howard, Dynamic programming and Markov processes, The Technology Press of M.I.T., Cambridge, Mass., 1960viii+136 MR0118514 0091.16001 Google Scholar[13A] William S. Jewell, Markov-renewal programming. I. Formulation, finite return models, Operations Res., 11 (1963), 938–948 MR0163374 0126.15905 CrossrefISIGoogle Scholar[13B] William S. Jewell, Markov-renewal programming. II. Infinite return models, example, Operations Res., 11 (1963), 949–971 MR0163375 0126.15905 CrossrefISIGoogle Scholar[14] Samuel Karlin, The structure of dynamic programming models, Naval Res. Logist. Quart., 2 (1955), 285–294 (1956) MR0077850 CrossrefGoogle Scholar[15] L. G. Mitten, Composition principles for synthesis of optimal multistage processes, Operations Res., 12 (1964), 610–619 MR0180374 0127.36502 CrossrefISIGoogle Scholar[16] L. S. Shapley, Stochastic games, Proc. Nat. Acad. Sci. U. S. A., 39 (1953), 1095–1100 MR0061807 0051.35805 CrossrefISIGoogle Scholar[17] Lars Erik Zachrisson, M. Dresher, , L. S. Shapley and , A. W. Tucker, Markov gamesAdvances in game theory, Princeton Univ. Press, Princeton, N.J., 1964, 211–253 MR0170729 Google Scholar Next article FiguresRelatedReferencesCited byDetails Qauxi: Cooperative multi-agent reinforcement learning with knowledge transferred from auxiliary taskNeurocomputing, Vol. 504 Cross Ref Data-driven optimal control with a relaxed linear programAutomatica, Vol. 136 Cross Ref Markov Decision Processes with Discounted Costs: Improved Successive Over-Relaxation Method24 March 2022 Cross Ref Markov Decision Processes with Discounted Rewards: Improved Successive Over-Relaxation Method12 January 2022 Cross Ref Robust Speed Control of Ultrasonic Motors Based on Deep Reinforcement Learning of a Lyapunov FunctionIEEE Access, Vol. 10 Cross Ref Data-Driven Optimal Control of Affine Systems: A Linear Programming PerspectiveIEEE Control Systems Letters, Vol. 6 Cross Ref Stochastic Dynamic Programming with Non-linear Discounting23 December 2020 | Applied Mathematics & Optimization, Vol. 84, No. 3 Cross Ref On Constructive Extractability of Measurable Selectors of Set-Valued MapsIEEE Transactions on Automatic Control, Vol. 66, No. 8 Cross Ref On the convergence of reinforcement learning with Monte Carlo Exploring StartsAutomatica, Vol. 129 Cross Ref Successive Over-Relaxation ${Q}$ -LearningIEEE Control Systems Letters, Vol. 4, No. 1 Cross Ref Affine Monotonic and Risk-Sensitive Models in Dynamic ProgrammingIEEE Transactions on Automatic Control, Vol. 64, No. 8 Cross Ref Optimal forest management under financial risk aversion with discounted Markov decision process modelsCanadian Journal of Forest Research, Vol. 49, No. 7 Cross Ref Optimizing over pure stationary equilibria in consensus stopping games2 November 2018 | Mathematical Programming Computation, Vol. 11, No. 2 Cross Ref Robust shortest path planning and semicontractive dynamic programming8 August 2016 | Naval Research Logistics (NRL), Vol. 66, No. 1 Cross Ref On the reduction of total‐cost and average‐cost MDPs to discounted MDPs25 May 2017 | Naval Research Logistics (NRL), Vol. 66, No. 1 Cross Ref Optimal Liquidation in a Level-I Limit Order Book for Large-Tick StocksAntoine Jacquier and Hao Liu5 July 2018 | SIAM Journal on Financial Mathematics, Vol. 9, No. 3AbstractPDF (845 KB)An Average Polynomial Algorithm for Solving Antagonistic Games on Graphs2 March 2018 | Journal of Computer and Systems Sciences International, Vol. 57, No. 1 Cross Ref Dynamic Programming15 February 2018 Cross Ref Dynamic Programming and Markov Decision Processes15 February 2018 Cross Ref Long-Term Values in Markov Decision Processes, (Co)Algebraically20 September 2018 Cross Ref IDENTIFICATION OF DISCRETE CHOICE DYNAMIC PROGRAMMING MODELS WITH NONPARAMETRIC DISTRIBUTION OF UNOBSERVABLES21 March 2016 | Econometric Theory, Vol. 33, No. 3 Cross Ref Dynamic Programming, Numerical15 February 2017 Cross Ref Regular Policies in Abstract Dynamic ProgrammingDimitri P. Bertsekas17 August 2017 | SIAM Journal on Optimization, Vol. 27, No. 3AbstractPDF (510 KB)Optimal Liquidation in a Level-I Limit Order Book for Large Tick StocksSSRN Electronic Journal Cross Ref Easy Affine Markov Decision Processes: TheorySSRN Electronic Journal Cross Ref Optimality of the fastest available server policy1 October 2016 | Queueing Systems, Vol. 84, No. 3-4 Cross Ref A global shooting algorithm for the facility location and capacity acquisition problem on a line with dense demandComputers & Operations Research, Vol. 71 Cross Ref Optimality of the Fastest Available Server PolicySSRN Electronic Journal Cross Ref Approximation of two-person zero-sum continuous-time Markov games with average payoff criterionOperations Research Letters, Vol. 43, No. 1 Cross Ref On variable discounting in dynamic programming: applications to resource extraction and other economic models9 August 2011 | Annals of Operations Research, Vol. 220, No. 1 Cross Ref Valuing Customer Portfolios with Endogenous Mass and Direct Marketing Interventions Using a Stochastic Dynamic Programming DecompositionMarketing Science, Vol. 33, No. 5 Cross Ref Divergence Behaviour of the Successive Geometric Mean Method of Pairwise Comparison Matrix Generation for a Multiple Stage, Multiple Objective Optimization Problem20 December 2013 | Journal of Multi-Criteria Decision Analysis, Vol. 21, No. 3-4 Cross Ref Solving multichain stochastic games with mean payoff by policy iteration Cross Ref Discounting axioms imply risk neutrality8 February 2012 | Annals of Operations Research, Vol. 208, No. 1 Cross Ref (Approximate) iterated successive approximations algorithm for sequential decision processes8 February 2012 | Annals of Operations Research, Vol. 208, No. 1 Cross Ref The multi-armed bandit, with constraints13 November 2012 | Annals of Operations Research, Vol. 208, No. 1 Cross Ref Persistently Optimal Policies in Stochastic Dynamic Programming with Generalized DiscountingMathematics of Operations Research, Vol. 38, No. 1 Cross Ref A Dynamic Game of Reputation and Economic Performances in Nondemocratic Regimes15 June 2012 | Dynamic Games and Applications, Vol. 2, No. 4 Cross Ref Stochastic mutual induction computing in Het-CoMP empowered cellular networks Cross Ref SWITCHING AND SEQUENCING AVAILABLE THERAPIES SO AS TO MAXIMIZE A PATIENT'S EXPECTED TOTAL LIFETIME16 May 2012 | International Journal of Biomathematics, Vol. 05, No. 04 Cross Ref Multigrid methods for two-player zero-sum stochastic games17 January 2012 | Numerical Linear Algebra with Applications, Vol. 19, No. 2 Cross Ref Cooperative Access Class Barring for Machine-to-Machine CommunicationsIEEE Transactions on Wireless Communications, Vol. 11, No. 1 Cross Ref PARTIALLY OBSERVABLE MARKOV DECISION PROCESSES AND PERIODIC POLICIES WITH APPLICATIONS30 April 2012 | International Journal of Information Technology & Decision Making, Vol. 10, No. 06 Cross Ref Approximate policy iteration: a survey and some new methods19 July 2011 | Journal of Control Theory and Applications, Vol. 9, No. 3 Cross Ref Total Expected Discounted Reward MDPS: Existence of Optimal Policies15 February 2011 Cross Ref Stationary policies with Markov partition propertyJournal of Statistics and Management Systems, Vol. 13, No. 6 Cross Ref Myopic Solutions of Homogeneous Sequential Decision ProcessesOperations Research, Vol. 58, No. 4-part-2 Cross Ref Partially observable Markov decision model for the treatment of early Prostate Cancer13 October 2010 | OPSEARCH, Vol. 47, No. 2 Cross Ref Computable Markov-perfect industry dynamicsThe RAND Journal of Economics, Vol. 41, No. 2 Cross Ref Dynamic Allocation of Scarce Resources Under Supply UncertaintySSRN Electronic Journal Cross Ref Economically Efficient Constitutional GovernanceSSRN Electronic Journal Cross Ref Applications of Metric Coinduction16 September 2009 | Logical Methods in Computer Science, Vol. 5, No. 3 Cross Ref Probabilistic models for optimizing patients survival ratesJournal of Interdisciplinary Mathematics, Vol. 11, No. 5 Cross Ref A multi-period TSP with stochastic regular and urgent demandsEuropean Journal of Operational Research, Vol. 185, No. 1 Cross Ref Four Canadian Contributions to Stochastic Modeling18 January 2017 | INFOR: Information Systems and Operational Research, Vol. 46, No. 1 Cross Ref Dynamic Programming5 December 2016 Cross Ref Financial intermediary's choice of borrowingApplied Economics, Vol. 40, No. 2 Cross Ref Optimal prepayment behaviourApplied Economics Letters, Vol. 14, No. 15 Cross Ref A structured pattern matrix algorithm for multichain Markov decision processes6 February 2007 | Mathematical Methods of Operations Research, Vol. 66, No. 3 Cross Ref Incomplete markets, labor supply and capital accumulationJournal of Monetary Economics, Vol. 54, No. 8 Cross Ref VARIATIONS ON THE THEME OF CONNING IN MATHEMATICAL ECONOMICSJournal of Economic Surveys, Vol. 21, No. 3 Cross Ref Commercial loan borrower’s optimal borrowing and prepayment decisions under uncertaintyApplied Economics, Vol. 39, No. 8 Cross Ref Risk-Sensitive and Risk-Neutral Multiarmed BanditsMathematics of Operations Research, Vol. 32, No. 2 Cross Ref Computable Markov-Perfect Industry Dynamics: Existence, Purification, and MultiplicitySSRN Electronic Journal Cross Ref Semi-Markov information model for revenue management and dynamic pricing9 March 2006 | OR Spectrum, Vol. 29, No. 1 Cross Ref A Turnpike Theorem For A Risk-Sensitive Markov Decision Process with StoppingEric V. Denardo and Uriel G. Rothblum26 July 2006 | SIAM Journal on Control and Optimization, Vol. 45, No. 2AbstractPDF (189 KB)Discounting and Risk NeutralitySSRN Electronic Journal Cross Ref Myopic Solutions of Homogeneous Sequential Decision ProcessesSSRN Electronic Journal Cross Ref Limited Attention as a Bounded on RationalitySSRN Electronic Journal Cross Ref Approximation solution and suboptimality for discounted semi-markov decision problems with countable state spaceOptimization, Vol. 53, No. 4 Cross Ref Optimal threshold probability in undiscounted Markov decision processes with a target setApplied Mathematics and Computation, Vol. 149, No. 2 Cross Ref Index Policies for Stochastic Search in a Forest with an Application to R&D Project ManagementMathematics of Operations Research, Vol. 29, No. 1 Cross Ref Recursive methods in probability control Cross Ref Optimism and overconfidence in searchReview of Economic Dynamics, Vol. 7, No. 1 Cross Ref Nonclassical Brock-Mirman EconomiesSSRN Electronic Journal Cross Ref Optimal policies in continuous time inventory control models with limited supplyComputers & Mathematics with Applications, Vol. 46, No. 7 Cross Ref Existence and Uniqueness of Solutions to the Bellman Equation in the Unbounded CaseEconometrica, Vol. 71, No. 5 Cross Ref Dynamic Airline Revenue Management with Multiple Semi-Markov DemandOperations Research, Vol. 51, No. 1 Cross Ref Finite State and Action MDPS Cross Ref Dynamic Programming Cross Ref Incomplete Markets, Labor Supply and Capital AccumulationSSRN Electronic Journal Cross Ref Overconfidence in SearchSSRN Electronic Journal Cross Ref Constrained Discounted Semi-Markov Decision Processes Cross Ref Controlled Markov Chains with Utility Functions Cross Ref Total Reward Criteria Cross Ref Is There a Curse of Dimensionality for Contraction Fixed Points in the Worst Case?Econometrica, Vol. 70, No. 1 Cross Ref SET-VALUED CONTROL LAWS IN TEV-DC CONTROL PROBLEMSIFAC Proceedings Volumes, Vol. 35, No. 1 Cross Ref Dynamic economic management of soil erosion, nutrient depletion, and productivity in the north central USA1 January 2001 | Land Degradation & Development, Vol. 12, No. 4 Cross Ref On Markov Policies for Minimax Decision ProcessesJournal of Mathematical Analysis and Applications, Vol. 253, No. 1 Cross Ref Recursive method in stochastic optimization under compound criteria Cross Ref Kulatilaka '93: The Case of a Dual Fuel Boiler: A Review, Gauss Codes and Numerical ExamplesSSRN Electronic Journal Cross Ref Kulatilaka '88 as a CVP Analysis in a Real Option Framework: A Review, Gauss Codes and Numerical ExamplesSSRN Electronic Journal Cross Ref A stochastic programming approach to manufacturing flow controlIIE Transactions, Vol. 32, No. 10 Cross Ref Chapter 5 Numerical solution of dynamic economic models Cross Ref A Theory of Constitutional Standards and Civil LibertySSRN Electronic Journal Cross Ref The one-sector growth model with idiosyncratic shocks: Steady states and dynamicsJournal of Monetary Economics, Vol. 39, No. 3 Cross Ref Pansystems optimization, generalized principles of optimality, and fundamental equations of dynamic programmingKybernetes, Vol. 26, No. 3 Cross Ref Introduction Cross Ref Stochastic Inventory Models with Limited Production Capacity and Periodically Varying Parameters27 July 2009 | Probability in the Engineering and Informational Sciences, Vol. 11, No. 1 Cross Ref A Comparison of Policy Iteration Methods for Solving Continuous-State, Infinite-Horizon Markovian Decision Problems Using Random, Quasi-random, and Deterministic DiscretizationsSSRN Electronic Journal Cross Ref On the value function in constrained control of Markov chainsMathematical Methods of Operations Research, Vol. 44, No. 3 Cross Ref Models for capacity acquisition decisions Journal of Systems, Vol. No. 3 Cross Ref capital and in of & Vol. No. 2 Cross Ref approximations for the control of a Journal of Operational Research, Vol. No. 1 Cross Ref Chapter 14 Numerical dynamic programming in Cross Ref The for Vol. 38, No. 1 Cross Ref A model of with limited Theory, Vol. 5, No. 1 Cross Ref control under Transactions on Automatic Control, Vol. 40, No. 2 Cross Ref discounted and undiscounted Markov Decision Problems June Cross Ref Game Models of Management Cross Ref Learning to dynamic Vol. No. Cross Ref AND Economic Vol. 33, No. Cross Ref May Cross Ref of linear programming for and Markovian control Methods and Models of Operations Research, Vol. 40, No. 1 Cross Ref and of equilibria in stochastic of Economic and Control, Vol. No. 2 Cross Ref optimal control of Journal of Operational Research, Vol. No. 2 Cross Ref Chapter of decision processes Cross Ref A generalized of the Theory, Vol. No. 1 Cross Ref Some structured dynamic in & Mathematics with Applications, Vol. No. Cross Ref Policy iteration and methods for Markov decision processes under average & Mathematics with Applications, Vol. No. Cross Ref Optimal control of a facility with of Optimization Theory and Applications, Vol. No. 3 Cross Ref A of in Management Cross Ref approach to dynamic of Mathematical Economics, Vol. 21, No. 1 Cross Ref Turnpike for a of in manufacturing flow of Operations Research, Vol. 29, No. 1 Cross Ref Optimal of a with Journal of Operational Research, Vol. No. 2 Cross Ref Dynamic programming and for of Mathematical Analysis and Applications, Vol. No. 1 Cross Ref optimal algorithm for stochastic Transactions on Automatic Control, Vol. No. 8 Cross Ref A algorithm for Control Applications and Vol. 12, No. 1 Cross Ref Deterministic and Games with Cross Ref for Stochastic Games Cross Ref Markovian Decision J. July 2006 | SIAM Journal on Control and Optimization, Vol. No. and in and Production Economics, Vol. 19, No. Cross Ref Recursive and the of Economic Theory, Vol. No. 2 Cross Ref Fixed for of of Mathematical Analysis and Applications, Vol. No. 1 Cross Ref Chapter 8 Markov decision processes Cross Ref Optimal Policies and the of April 2012 | The Journal of Vol. 44, No. 5 Cross Ref Controlled semi-markov models the discounted of and Vol. 21, No. 3 Cross Ref A for the solution of time horizon decision Stochastic Models and Analysis, Vol. 4, No. 4 Cross Ref under and state Research Vol. 35, No. 5 Cross Ref Sequential equilibria in two-person of Optimization Theory and Applications, Vol. No. 1 Cross Ref of Discrete Control August 2006 | SIAM Journal on Control and Optimization, Vol. 26, No. of linear programming to discounted Markovian decision Vol. 10, No. 3 Cross Ref Contraction undiscounted Markov decision of Mathematical Analysis and Applications, Vol. No. 1 Cross Ref Solving Markovian decision processes by successive of of Mathematical Analysis and Applications, Vol. No. 2 Cross Ref The of and A Analysis Cross Ref On the Existence of Sequential in Markov Games Cross Ref Optimality for continuous time with Markov Application to an planning January 2006 Cross Ref Applications of methods to and Vol. 51, No. 6 Cross Ref On dynamic strategies in horizon models the of Economic & Vol. No. 3 Cross Ref Abstract Dynamic Programming Models under and H. July 2006 | SIAM Journal on Control and Optimization, Vol. No. A stochastic control October 2007 | Optimal Control Applications and Vol. No. 3 Cross Ref for dynamic programming with of Optimization Theory and Applications, Vol. 54, No. 1 Cross Ref on the of a of Mathematical Analysis and Applications, Vol. No. 2 Cross Ref on the of a Finite Markov July 2009 | Probability in the Engineering and Informational Sciences, Vol. No. 1 Cross Ref Dynamic Programming and Markov Decision November 2016 Cross Ref Optimal policies for in Stochastic Vol. No. 2 Cross Ref On the of capital of Economic Theory, Vol. 40, No. 1 Cross Ref in Markov decision of Mathematical Analysis and Applications, Vol. No. 2 Cross Ref Fixed for discounted finite decision of Mathematical Analysis and Applications, Vol. No. 2 Cross Ref Some new mathematical methods in dynamic programming over Vol. 9, No. 1 Cross Ref Approximation and in dynamic Transactions on Automatic Control, Vol. No. 3 Cross Ref On the of in Discounted Stochastic Dynamic Games Cross Ref Optimal decisions over time and an by the Bellman Vol. 7, No. Cross Ref Reward for Markov decision processes Cross Ref MARKOV DECISION Vol. 39, No. 2 Cross Ref optimal policies in inventory models with continuous July 2016 | in Applied Vol. No. 2 Cross Ref Finite state for average state Markov decision March | Vol. 7, No. 1 Cross Ref for a discounted Markov decision Processes and Applications, Vol. 19, No. 1 Cross Ref A survey on in Vol. No. 2 Cross Ref A Fixed to Markov and P. J. July 2006 | SIAM Journal on Discrete Vol. 5, No. policy iteration Research Letters, Vol. No. 5 Cross Ref Stochastic Production with Production S. P. R. and N. August 2006 | SIAM Journal on Control and Optimization, Vol. No. policies in dynamic programming: Linear programming suboptimality and Programming, Vol. No. 1 Cross Ref Optimal and control of with Research Logistics Vol. No. 2 Cross Ref Dynamic I. February 2012 | SIAM Journal on Control and Optimization, Vol. 21, No. 3AbstractPDF Optimal Control of Partially Semi-Markov Processes the Infinite Discounted Cross Ref of observable Markov decision processes linear of Mathematical Analysis and Applications, Vol. No. 1 Cross Ref A of & Letters, Vol. 12, No. 3-4 Cross Ref Optimization of networks Markov Operations Research, Vol. 26, No. 1 Cross Ref The of discounted Markov decision July 2016 | Journal of Applied Vol. 19, No. 04 Cross Ref The of discounted Markov decision July 2016 | Journal of Applied Vol. 19, No. 4 Cross Ref the in with de de Vol. 33, No. 3 Cross Ref A of inventory of Mathematical Analysis and Applications, Vol. No. 1 Cross Ref for the structure of optimal strategies in dynamic of Optimization Theory and Applications, Vol. No. 3 Cross Ref Finite state approximations for state horizon discounted Markov decision processes with of Mathematical Analysis and Applications, Vol. No. 1 Cross Ref Solving equations by March 2011 | Operations Research, Vol. No. 2 Cross Ref the discounted return in and semi-markov Research Logistics Vol. No. 4 Cross Ref optimal policies for Research Logistics Vol. No. 3 Cross Ref Optimal control of Research Logistics Vol. No. 3 Cross Ref A of the of Mathematical Analysis and Applications, Vol. No. 1 Cross Ref replacement with a Markovian July 2016 | Journal of Applied Vol. No. Cross Ref replacement with a Markovian July 2016 | Journal of Applied Vol. No. 3 Cross Ref optimal policies for structured Markov decision Journal of Operational Research, Vol. 7, No. 4 Cross Ref Markov decision problems with of Optimization Theory and Applications, Vol. No. 2 Cross Ref stopping July 2016 | Journal of Applied Vol. No. Cross Ref stopping July 2016 | Journal of Applied Vol. No. 2 Cross Ref On the convergence of successive approximations in dynamic programming with Operations Research, Vol. No. 3 Cross Ref and in generalized Research Logistics Vol. No. 1 Cross Ref Optimality in and linear Programming, Vol. No. 1 Cross Ref Economic of of Economics and Vol. 7, No. 4 Cross Ref Optimal sequential and resource under July 2016 | in Applied Vol. 12, No. 04 Cross Ref Stochastic optimal The time Transactions on Automatic Control, Vol. No. 6 Cross Ref Optimal sequential and resource under July 2016 | in Applied Vol. 12, No. 4 Cross Ref Improved of the discounted return in Markov and Operations Research, Vol. No. 5 Cross Ref Discounted Stochastic R. and P. July 2006 | SIAM Journal on Discrete Vol. No. 2AbstractPDF KB)Optimal policies for the Research Logistics Vol. 27, No. 1 Cross Ref Optimal policies for Research Logistics Vol. 27, No. 1 Cross Ref approximations for discounted Markov decision of Mathematical Analysis and Applications, Vol. No. 1 Cross Ref and Approximation of Sequential July 2006 | SIAM Journal on Control and Optimization, Vol. No. June 2007 | Optimization, Vol. 11, No. 1 Cross Ref A method of for discounted Markov decision Operations Research, Vol. No. 7 Cross Ref A for approximations of Markov of Mathematical Analysis and Applications, Vol. 71, No. 1 Cross Ref Steady State Policies for Deterministic Dynamic July 2006 | SIAM Journal on Applied Mathematics, Vol. No. KB)Optimal policies for a of stochastic Research Logistics Vol. 26, No. 2 Cross Ref in a Generalized Markov Decision J. July 2006 | SIAM Journal on Control and Optimization, Vol. No. 2AbstractPDF and for Markov decision problems with the May | Vol. No. 1 Cross Ref Geometric convergence of in multichain Markov decision July 2016 | in Applied Vol. 11, No. Cross Ref Geometric convergence of in multichain Markov decision July 2016 | in Applied Vol. 11, No. 1 Cross Ref A survey of for some of Markov decision problems Cross Ref Successive approximations for Markov decision processes and Markov games with Optimization, Vol. 10, No. 3 Cross Ref Markov decision processes and Processes and Applications, Vol. No. 1 Cross Ref Contraction undiscounted Markov decision of Mathematical Analysis and Applications, Vol. No. 3 Cross Ref A Stochastic Game of a July 2006 | SIAM Journal on Control and Optimization, Vol. No. 3AbstractPDF KB)Optimal policies for a replacement and Control, Vol. No. 1 Cross Ref A zero-sum stochastic game model of Journal of Game Theory, Vol. 7, No. 1 Cross Ref AND IN MARKOV DECISION A Cross Ref THE OF by Cross Ref DYNAMIC PROGRAMMING IN by Cross Ref DYNAMIC by Cross Ref ON OF DYNAMIC Cross Ref OF DYNAMIC PROGRAMMING Cross Ref Cross Ref Successive for stochastic Cross Ref Mappings with Application in Dynamic ProgrammingDimitri P. July 2006 | SIAM Journal on Control and Optimization, Vol. No. 3AbstractPDF programming by successive approximations with to of Mathematical Analysis and Applications, Vol. 58, No. 2 Cross Ref On the Optimality of Policies in Decision and M. and L. July 2006 | SIAM Journal on Applied Mathematics, Vol. 32, No. 2AbstractPDF and Markov Programming Cross Ref Theory and Markovian Decision Chains Cross Ref A of successive methods for discounted Markovian decision Operations Research, Vol. No. 5 Cross Ref The on optimal of in labor in the of Economic Theory, Vol. 13, No. 1 Cross Ref On for successive Transactions on Automatic Control, Vol. 21, No. 3 Cross Ref The on optimal of in labor in the May Cross Ref Cross Ref A of the of the in Dynamic July 2007 | A Transactions, Vol. No. 1 Cross Ref of in dynamic Transactions on Automatic Control, Vol. No. 3 Cross Ref of some of dynamic and Control, Vol. 27, No. 4 Cross Ref and finite state Markovian decision of Mathematical Analysis and Applications, Vol. 49, No. 3 Cross Ref Discounted decision linear programming and policy Vol. 29, No. 1 Cross Ref On the of Dynamic Programming Cross Ref Introduction to Dynamic from E. V. Denardo and L. G. Mitten, of Sequential Decision Journal of Engineering Cross Ref in Operations Research, Vol. No. 3 Cross Ref Optimal capital under of Economic Theory, Vol. No. 2 Cross Ref of optimization problems and decision of Computer and Sciences, Vol. No. 1 Cross Ref A Class of Markovian Decision Processes Cross Ref Optimal Control of Queueing Systems Cross Ref of dynamic of Mathematical Analysis and Applications, Vol. 43, No. 3 Cross Ref stochastic July 2016 | Journal of Applied Vol. 10, No. Cross Ref stochastic July 2016 | Journal of Applied Vol. 10, No. 3 Cross Ref Optimal policies for a in to stochastic Research Logistics Vol. No. 2 Cross Ref information and decision and Vol. 9, No. 5 Cross Ref decision Research Logistics Vol. No. 1 Cross Ref dynamic of Optimization Theory and Applications, Vol. 11, No. 3 Cross Ref of a Markovian decision problem by successive Operations Research, Vol. No. 1 Cross Ref the of in Vol. No. 1 Cross Ref for optimization and Control, Vol. 21, No. 5 Cross Ref approximations to dynamic of Mathematical Analysis and Applications, Vol. No. 3 Cross Ref On a of optimal policies in continuous time Markovian decision of Mathematical Analysis and Applications, Vol. No. 1 Cross Ref Applications of Metric Cross Ref games Cross Ref Dynamic programming of stochastic networks with Cross Ref and in decision processes Cross Ref optimal algorithm for continuous state time stochastic control Cross Ref games Cross Ref approach to optimization of discounted stochastic continuous-time Cross Ref Finite state continuous time Markov decision processes with an planning of Mathematical Analysis and Applications, Vol. No. 3 Cross Ref Markov V. Denardo and B. L. July 2006 | SIAM Journal on Applied Mathematics, Vol. No. 3AbstractPDF of Stationary Optimal Policies for Some Markov July 2006 | SIAM Review, Vol. 9, No. 3AbstractPDF Programming by Linear August 2006 | SIAM Journal on Applied Mathematics, Vol. 14, No. 9, August July 2006 for and Applied & for and Applied Mathematics
- Research Article
73
- 10.1137/1130036
- Jun 1, 1986
- Theory of Probability & Its Applications
Semi-Markov and Jump Markov Controlled Models: Average Cost Criterion
- Research Article
33
- 10.4233/uuid:201d5145-0717-4dea-b0d0-c018e510fdaa
- Nov 3, 2014
- Research Repository (Delft University of Technology)
Stochastic hybrid systems involve the coupling of discrete, continuous, and probabilistic phenomena, in which the composition of continuous and discrete variables captures the behavior of physical systems interacting with digital, computational devices. Because of their versatility and generality, methods for modeling, analysis, and verification of stochastic hybrid systems (SHS) have proved invaluable in a wide range of applications, including biology, smart grids, air traffic control, finance, and automotive systems. The problems of verification and of controller synthesis over SHS can be algorithmically studied using methodologies and tools developed in computer science, utilizing proper symbolic models describing the overall behaviors of the SHS. A promising direction to address formal verification and synthesis against complex logic specifications, such as PCTL and BLTL, is the use of abstraction with finitely many states. This thesis is devoted to formal abstractions for verification and synthesis of SHS by bridging the gap between stochastic analysis, computer science, and control engineering. A SHS is first considered as a discrete time Markov process over a general state space, then is abstracted as a finite-state Markov chain to be formally verified against the desired specification. We generate finite abstractions of general state-space Markov processes based on the partitioning of the state space, which provide a Markov chain as an approximation of the original process. We put forward a novel adaptive and sequential gridding algorithm based on non-uniform quantization of the state space that is expected to conform to the underlying dynamics of the model and thus to mitigate the curse of dimensionality unavoidably related to the partitioning procedure. PCTL and BLTL properties are defined over trajectories of a system. Examples of such properties are probabilistic safety and reach-avoid specifications. While the developed techniques are applicable to a wide arena of probabilistic properties, the thesis focuses on the study of the particular specification probabilistic safety or invariance, over a finite horizon. Abstraction of controlled discrete-time Markov processes to Markov decision processes over finite sets of states is also studied in the thesis. The proposed abstraction scheme enables us to solve the problem of obtaining a maximally safe Markov policy for the Markov decision process and synthesize a control policy for the original model. The total error is quantified which is due to the abstraction procedure and caused by exporting the result back to the original process. The abstraction error hinges on the regularity of the stochastic kernel of the process, i.e. its Lipschitz continuity. Furthermore, this thesis extends the results in the following directions: 1) Partially degenerate stochastic processes suffer from non-smooth probabilistic evolution of states. The stochastic kernel of such processes does not satisfy Lipschitz continuity assumptions which requires us to develop new techniques specialized for this class of processes. We have shown that the probabilistic invariance problem over such processes can be separated into two parts: a deterministic reachability analysis, and a probabilistic invariance problem that depends on the outcome of the first. This decomposition approach leads to computational improvements. 2) The abstraction approach have leveraged piece-wise constant interpolations of the stochastic kernel of the process. We extend this approach for systems with higher degrees of smoothness in their probabilistic evolution and provide approximation methods via higher-order interpolations that are aimed at requiring less computational effort. Using higher-order interpolations (versus piece-wise constant ones) can be beneficial in terms of obtaining tighter bounds on the approximation error. Furthermore, since the approximation procedures depend on the partitioning of the state space, higher-order schemes display an interesting tradeoff between more parsimonious representations versus more complex local computation. From the application point of view, an example of SHS is the model of thermostatically controlled loads (TCLs), which captures the evolution of temperature inside a building. This thesis proposes a new, formal two-step abstraction procedure to generate a finite stochastic dynamical model as the aggregation of the dynamics of a population of TCLs. The approach relaxes the limiting assumptions employed in the literature by providing a model based on the natural probabilistic evolution of the single TCL temperature. We also describe a dynamical model for the time evolution of the abstraction, and develop a set-point control strategy aimed at reference tracking over the total power consumption of the TCL population. The abstraction algorithms discussed in this thesis have been implemented as a MATLAB tool FAUST2 (abbreviation for “Formal Abstractions of Uncountable-STate STochastic processes”). The software is freely available for download at http://sourceforge.net/projects/faust2/.
- Single Book
69
- 10.1201/b15998
- Apr 19, 2016
Markov Chain Structure and Models Historical Note States and Transitions Model of the Weather Random Walks Estimating Transition Probabilities Multiple-Step Transition Probabilities State Probabilities after Multiple Steps Classification of States Markov Chain Structure Markov Chain Models Problems References Regular Markov Chains Steady State Probabilities First Passage to a Target State Problems References Reducible Markov Chains Canonical Form of the Transition Matrix The Fundamental Matrix Passage to a Target State Eventual Passage to a Closed Set Within a Reducible Multichain Limiting Transition Probability Matrix Problems References A Markov Chain with Rewards (MCR) Rewards Undiscounted Rewards Discounted Rewards Problems References A Markov Decision Process (MDP) An Undiscounted MDP A Discounted MDP Problems References Special Topics: State Reduction and Hidden Markov Chains State Reduction An Introduction to Hidden Markov Problems References Index
- Book Chapter
- 10.1016/s0169-7161(05)80130-0
- Jan 1, 1993
- Handbook of Statistics
6 Algorithms and complexity for markov processes
- Conference Article
5
- 10.1109/lics.2019.8785706
- Jun 1, 2019
Graph planning gives rise to fundamental algorithmic questions such as shortest path, traveling salesman problem, etc. A classical problem in discrete planning is to consider a weighted graph and construct a path that maximizes the sum of weights for a given time horizon T. However, in many scenarios, the time horizon is not fixed, but the stopping time is chosen according to some distribution such that the expected stopping time is T. If the stopping time distribution is not known, then to ensure robustness, the distribution is chosen by an adversary, to represent the worst-case scenario. A stationary plan for every vertex always chooses the same outgoing edge. For fixed horizon or fixed stopping-time distribution, stationary plans are not sufficient for optimality. Quite surprisingly we show that when an adversary chooses the stopping-time distribution with expected stopping time T, then stationary plans are sufficient. While computing optimal stationary plans for fixed horizon is NP-complete, we show that computing optimal stationary plans under adversarial stopping-time distribution can be achieved in polynomial time. Consequently, our polynomial-time algorithm for adversarial stopping time also computes an optimal plan among all possible plans.
- Conference Article
- 10.1145/320599.322508
- Jan 1, 1985
In this paper we obtain two closely related theorems that essentially say that no matter what information metric is used, on the average the value of the accumulated information at stopping time is bounded by a multiple of the expected stopping time. These results are also independent of the particular stopping strategy employed although they do require that the expected stopping time be finite. These results, along with a general type of stopping strategy based on incremental information, are given. Later we apply our general theorem to a specific stopping strategy associated with the GIS model. Although we concentrate on the problem of stopping, the information function on which this stopping decision is based can also be used to choose the COA for the next cycle of the feedback loop. We apply our results to an estimation problem involving the well known Shannon-Weiner measure of information. Since our theorems require that the expected stopping times be finite, sometime is devoted to a discussion of necessary and sufficient conditions for finite expected stopping times.
- Research Article
2
- 10.1016/0270-0255(85)90034-x
- Jan 1, 1985
- Mathematical Modelling
Some dynamical properties of sequentially acquired information
- Book Chapter
8
- 10.1007/978-3-030-59152-6_14
- Jan 1, 2020
Discrete-time Markov Chains (MCs) and Markov Decision Processes (MDPs) are two standard formalisms in system analysis. Their main associated quantitative objectives are hitting probabilities, discounted sum, and mean payoff. Although there are many techniques for computing these objectives in general MCs/MDPs, they have not been thoroughly studied in terms of parameterized algorithms, particularly when treewidth is used as the parameter. This is in sharp contrast to qualitative objectives for MCs, MDPs and graph games, for which treewidth-based algorithms yield significant complexity improvements. In this work, we show that treewidth can also be used to obtain faster algorithms for the quantitative problems. For an MC with n states and m transitions, we show that each of the classical quantitative objectives can be computed in \(O((n+m)\cdot t^2)\) time, given a tree decomposition of the MC with width t. Our results also imply a bound of \(O(\kappa \cdot (n+m)\cdot t^2)\) for each objective on MDPs, where \(\kappa \) is the number of strategy-iteration refinements required for the given input and objective. Finally, we make an experimental evaluation of our new algorithms on low-treewidth MCs and MDPs obtained from the DaCapo benchmark suite. Our experiments show that on low-treewidth MCs and MDPs, our algorithms outperform existing well-established methods by one or more orders of magnitude.
- Conference Article
- 10.5555/3329995.3329998
- Aug 18, 2017
I will survey a body of work, developed over the past decade or so, on algorithms for, and the computational complexity of, analyzing and model checking some important families of countably infinite-state Markov chains, Markov decision processes (MDPs), and stochastic games. These models arise by adding natural forms of recursion, branching, or a counter, to finite-state models, and they correspond to probabilistic/control/game extensions of classic automata-theoretic models like pushdown automata, context-free grammars, and one-counter automata. They subsume some classic stochastic processes such as multi-type branching processes and quasibirth-death processes. They also provide a natural model for probabilistic procedural programs with recursion.Some of the key algorithmic advances for analyzing these models have come from algorithms for computing the least fixed point (and greatest fixed point) solution for corresponding monotone systems of nonlinear (min/max)-polynomial equations. Such equations provide, for example, the Bellman optimality equations for optimal extinction and reachability probabilities for branching MDPs (BMDPs). A key role in these algorithms is played by Newton's method, and by a generalization of Newton's method which is applicable to the Bellman equations for BMDPs, and which uses linear programming in each iteration.By now, polynomial time algorithms have been developed for some of the key problems in this domain, while other problems have been shown to have high complexity, or to even be undecidable. Yet many algorithmic questions about these models remain open. I will highlight some of the open questions.(This talk partly describes joint work with Alistair Stewart and Mihalis Yannakakis.)
- Research Article
20
- 10.1137/1129064
- Jan 1, 1985
- Theory of Probability & Its Applications
Previous article Next article Adaptive Strategies for Certain Classes of Controlled Markov ProcessesE. I. GordienkoE. I. Gordienkohttps://doi.org/10.1137/1129064PDFBibTexSections ToolsAdd to favoritesExport CitationTrack CitationsEmail SectionsAbout[1] V. N. Fomin, , A. L. Fradkov and , V. A. Yakubovich, Adaptive Control of Dynamic Objects, Nauka, Moscow, 1981, (In Russian.) 0522.93002 Google Scholar[2] V. G. Sragovich, Theory of Adaptive Systems, Nauka, 1976Moscow, (In Russian.) 0333.93005 Google Scholar[3] Yu. V. Popov, Adaptive systems for the control of certain classes of random processes of general type, Studies in the theory of adaptive systems (Russian), Vyčisl. Centr, Akad. Nauk SSSR, Moscow, 1976, 119–142, 223, (In Russian.) 58:33328 Google Scholar[4] G. A. Agasandyan, Adaptive system for homogeneous processes with continuous sets of states and controls, Theory Prob. Appl., 24 (1979), 515–528 0409.93030 Google Scholar[5] E. I. Gordienko, Adaptive optimal control of some Markov processes, Dokl. Akad. Nauk SSSR, 261 (1981), 271–275, (In Russian.) 83b:93041 0494.93027 Google Scholar[6] Ye. B. Dynkin and , A. A. Yushkevich, Controlled Markov processes, Grundlehren der Mathematischen Wissenschaften [Fundamental Principles of Mathematical Sciences], Vol. 235, Springer-Verlag, Berlin, 1979xvii+289 80k:90037 CrossrefGoogle Scholar[7] J. Doob, Stochastic processes, John Wiley & Sons Inc., New York, 1953viii+654 15,445b 0053.26802 Google Scholar[8] V. V. Kalashnikov, Qualitative Analysis of the Behavior of Complex Systems by the Method of Probe Functions, Nauka, Moscow, 1978, (In Russian.) 0451.93002 Google Scholar[9] L. G. Glubenko and , E. S. Shtatland, On controlled Markov processes with discrete timeTheory of Probability and Mathematical Statistics, Vol. 7, Naukova Dumka, Kiev, 1972, 51–64, (In Russian.) Google Scholar[10] V. V. Petrov, Sums of independent random variables, Springer-Verlag, New York, 1975x+346 52:9335 0322.60042 CrossrefGoogle Scholar[11] P. Ganssler and , W. Stute, Empirical processes: a survey of results for independent and identically distributed random variables, Ann. Probab., 7 (1979), 193–243 80d:60002 CrossrefGoogle Scholar[12] Patrick Billingsley and , Flemming Topsøe, Uniformity in weak convergence, Z. Wahrscheinlichkeitstheorie und Verw. Gebiete, 7 (1967), 1–16 35:326 0147.15701 CrossrefGoogle Scholar[13] A. N. Kolmogorov and , V. M. Tikhomirov, $\varepsilon$-entropy and $\varepsilon$-capacity of sets in function spaces, Uspehi Mat. Nauk, 14 (1959), 3–86 22:2890 Google Scholar[14] R. M. Dudley, The speed of mean Glivenko-Cantelli convergence, Ann. Math. Statist, 40 (1968), 40–50 38:5270 0184.41401 CrossrefGoogle Scholar Previous article Next article FiguresRelatedReferencesCited ByDetails Asymptotically Optimal Strategies for Adaptive Zero-Sum Discounted Markov GamesJ. Adolfo Minjárez-Sosa and Oscar Vega-AmayaSIAM Journal on Control and Optimization, Vol. 48, No. 3 | 15 April 2009AbstractPDF (230 KB)Empirical estimation in average Markov control processesApplied Mathematics Letters, Vol. 21, No. 5 | 1 May 2008 Cross Ref Average Optimality for Adaptive Markov Control Processes with Unbounded Costs and Unknown Disturbance DistributionMarkov Processes and Controlled Markov Chains | 1 Jan 2002 Cross Ref Approximation of average cost optimal policies for general Markov decision processes with unbounded costsMathematical Methods of Operations Research, Vol. 45, No. 2 | 1 Jun 1997 Cross Ref Recurrence conditions for Markov decision processes with Borel state space: A surveyAnnals of Operations Research, Vol. 28, No. 1 | 1 Dec 1991 Cross Ref Nonparametric estimation and adaptive control in a class of finite Markov decision chainsAnnals of Operations Research, Vol. 28, No. 1 | 1 Dec 1991 Cross Ref Density estimation and adaptive control of markov processes: Average and discounted criteriaActa Applicandae Mathematicae, Vol. 20, No. 3 | 1 Sep 1990 Cross Ref Nonparametric adaptive control of discrete-time partially observable stochastic systemsJournal of Mathematical Analysis and Applications, Vol. 137, No. 2 | 1 Feb 1989 Cross Ref Continuous dependence of stochastic control models on the noise distributionApplied Mathematics & Optimization, Vol. 17, No. 1 | 1 Jan 1988 Cross Ref Adaptive policies for discrete-time stochastic control systems with unknown disturbance distributionSystems & Control Letters, Vol. 9, No. 4 | 1 Oct 1987 Cross Ref Adaptive control of stochastic systems with unknown noise distribution--Discounted reward criterion1986 25th IEEE Conference on Decision and Control | 1 Dec 1986 Cross Ref Volume 29, Issue 3| 1985Theory of Probability & Its Applications427-645 History Submitted:12 April 1981Published online:17 July 2006 InformationCopyright © Society for Industrial and Applied MathematicsPDF Download Article & Publication DataArticle DOI:10.1137/1129064Article page range:pp. 504-518ISSN (print):0040-585XISSN (online):1095-7219Publisher:Society for Industrial and Applied Mathematics
- Book Chapter
1
- 10.1017/cbo9781316471104.009
- Jan 1, 2016
A Markov decision process (MDP) is a Markov process with feedback control. That is, as illustrated in Figure 6.1, a decision-maker (controller) uses the state x k of the Markov process at each time k to choose an action u k . This action is fed back to the Markov process and controls the transition matrix P ( u k ). This in turn determines the probability that the Markov process jumps to a particular state x k +1 at time k + 1 and so on. The aim of the decision-maker is to choose a sequence of actions over a time horizon to minimize a cumulative cost function associated with the expected value of the trajectory of the Markov process. MDPs arise in stochastic optimization models in telecommunication networks, discrete event systems, inventory control, finance, investment and health planning. Also POMDPs can be viewed as continuous state MDPs. This chapter gives a brief description of MDPs which provides a starting point for POMDPs. The main result is that optimal choice of actions by the controller in Figure 6.1 is obtained by solving a backward stochastic dynamic programming problem. Finite state finite horizon MDP Let k = 0, 1, …, N denote discrete time. N is called the time horizon or planning horizon. In this section we consider MDPs where the horizon N is finite. The finite state MDP model consists of the following ingredients: 1. X = {1, 2, …, X } denotes the state space and x k ∈ X denotes the state of the controlled Markov chain at time k = 0, 1, …, N . 2. U = {1, 2, …, U } denotes the action space. The elements u ∈ U are called actions. In particular, u k ∈ U denotes the action chosen at time k . 3. For each action u ∈ U and time k ∈ {0, …, N −1}, P ( u , k ) denotes an X × X transition probability matrix with elements P ij ( u , k ) = ℙ( x k +1 = j | x k = i , u k = u ), i , j ∈ X . 4. For each state i ∈ X , action u ∈ U and time k ∈ {0, 1, …, N −1}, the scalar c ( i , u , k ) denotes the one-stage cost incurred by the decision-maker (controller).
- Conference Article
4
- 10.1007/978-3-662-49630-5\_18
- Jan 1, 2020
- HAL (Le Centre pour la Communication Scientifique Directe)
Given two labelled Markov decision processes (MDPs), the trace-refinement problem asks whether for all strategies of the first MDP there exists a strategy of the second MDP such that the induced labelled Markov chains are trace-equivalent. We show that this problem is decidable in polynomial time if the second MDP is a Markov chain. The algorithm is based on new results on a particular notion of bisimulation between distributions over the states. However, we show that the general trace-refinement problem is undecidable, even if the first MDP is a Markov chain. Decidability of those problems was stated as open in 2008. We further study the decidability and complexity of the trace-refinement problem provided that the strategies are restricted to be memoryless.
- Research Article
- 10.23638/lmcs-16(2:10)2020
- Jun 3, 2020
- Logical Methods in Computer Science
Given two labelled Markov decision processes (MDPs), the trace-refinement problem asks whether for all strategies of the first MDP there exists a strategy of the second MDP such that the induced labelled Markov chains are trace-equivalent. We show that this problem is decidable in polynomial time if the second MDP is a Markov chain. The algorithm is based on new results on a particular notion of bisimulation between distributions over the states. However, we show that the general trace-refinement problem is undecidable, even if the first MDP is a Markov chain. Decidability of those problems was stated as open in 2008. We further study the decidability and complexity of the trace-refinement problem provided that the strategies are restricted to be memoryless.