A motion strategy for finding human faces with a drone

  • Abstract
  • Literature Map
  • Similar Papers
Abstract
Translate article icon Translate Article Star icon
Take notes icon Take Notes

In this paper, we address the problem of detecting a human face with a drone equipped with a monocular camera. We assume that there is a person inside the drone's field of view, and the drone wants to move to a position such that it observes the person's face from the front. Our approach mixes localization of the drone relative to the person and visual detection of human faces. We model the problem as a partially observable Markov decision process (POMDP) and solve it utilizing stochastic dynamic programming with imperfect information for computing motion strategies, along with deep neural networks to estimate the location of the drone and infer depth from monocular images for navigation and obstacle avoidance. The approach is evaluated through simulations and experiments in real environments, dealing with obstacles that generate motion constraints and visibility obstructions. Additionally, the proposed method is compared with other alternatives, including a single-shot approach based on YOLO, showing the superiority of our proposal.

Similar Papers
  • Research Article
  • Cite Count Icon 454
  • 10.1137/1009030
Contraction Mappings in the Theory Underlying Dynamic Programming
  • Apr 1, 1967
  • SIAM Review
  • Eric V Denardo

Next article Contraction Mappings in the Theory Underlying Dynamic ProgrammingEric V. DenardoEric V. Denardohttps://doi.org/10.1137/1009030PDFBibTexSections ToolsAdd to favoritesExport CitationTrack CitationsEmail SectionsAbout[1] Richard Bellman, Dynamic programming, Princeton Univeristy Press, Princeton, N. J., 1957xxv+342 MR0090477 Google Scholar[2] David Blackwell, Discrete dynamic programming, Ann. Math. Statist., 33 (1962), 719–726 MR0149965 0133.12906 CrossrefISIGoogle Scholar[3] David Blackwell, Discounted dynamic programming, Ann. Math. Statist., 36 (1965), 226–235 MR0173536 0133.42805 CrossrefGoogle Scholar[4] A. Charnes and , R. G. Schroeder, On some tactical antisubmarine games, Systems Research Memorandum No. 131, The Technological Institute, Northwestern University, Evanston, Illinois, 1965 Google Scholar[5] E. V. Denardo, Masters Thesis, Sequential decision processes, Doctoral thesis, Northwestern University, Evanston, Illinois, 1965 Google Scholar[6] F. D'Epenoux, Sur un problème de production de stockage dans l'aleatoire, Rev. Française Recherche Operationelle, 14 (1960), 3–16 Google Scholar[7] Cyrus Derman, On sequential decisions and Markov chains, Management Sci., 9 (1962/1963), 16–24 MR0169685 0995.90621 CrossrefISIGoogle Scholar[8] Cyrus Derman and , Morton Klein, Some remarks on finite horizon Markovian decision models, Operations Res., 13 (1965), 272–278 MR0175636 0137.13901 CrossrefISIGoogle Scholar[9] J. H. Eaton and , L. A. Zadeh, Optimal pursuit strategies in discrete-state probabilistic systems, Trans. ASME Ser. D. J. Basic Engrg., 84 (1962), 23–29 MR0153510 CrossrefGoogle Scholar[10] L. È. Èlsgol'c, Qualitative methods in mathematical analysis, Translations of Mathematical Monographs, Vol. 12, American Mathematical Society, Providence, R.I., 1964vii+250, Trans. by A. A. Brown and J. M. Danskin MR0170048 0133.37102 CrossrefGoogle Scholar[11] B. Fox, Age replacement with discounting, Operations Res., to appear Google Scholar[12] Ronald A. Howard, Dynamic programming and Markov processes, The Technology Press of M.I.T., Cambridge, Mass., 1960viii+136 MR0118514 0091.16001 Google Scholar[13A] William S. Jewell, Markov-renewal programming. I. Formulation, finite return models, Operations Res., 11 (1963), 938–948 MR0163374 0126.15905 CrossrefISIGoogle Scholar[13B] William S. Jewell, Markov-renewal programming. II. Infinite return models, example, Operations Res., 11 (1963), 949–971 MR0163375 0126.15905 CrossrefISIGoogle Scholar[14] Samuel Karlin, The structure of dynamic programming models, Naval Res. Logist. Quart., 2 (1955), 285–294 (1956) MR0077850 CrossrefGoogle Scholar[15] L. G. Mitten, Composition principles for synthesis of optimal multistage processes, Operations Res., 12 (1964), 610–619 MR0180374 0127.36502 CrossrefISIGoogle Scholar[16] L. S. Shapley, Stochastic games, Proc. Nat. Acad. Sci. U. S. A., 39 (1953), 1095–1100 MR0061807 0051.35805 CrossrefISIGoogle Scholar[17] Lars Erik Zachrisson, M. Dresher, , L. S. Shapley and , A. W. Tucker, Markov gamesAdvances in game theory, Princeton Univ. Press, Princeton, N.J., 1964, 211–253 MR0170729 Google Scholar Next article FiguresRelatedReferencesCited byDetails Qauxi: Cooperative multi-agent reinforcement learning with knowledge transferred from auxiliary taskNeurocomputing, Vol. 504 Cross Ref Data-driven optimal control with a relaxed linear programAutomatica, Vol. 136 Cross Ref Markov Decision Processes with Discounted Costs: Improved Successive Over-Relaxation Method24 March 2022 Cross Ref Markov Decision Processes with Discounted Rewards: Improved Successive Over-Relaxation Method12 January 2022 Cross Ref Robust Speed Control of Ultrasonic Motors Based on Deep Reinforcement Learning of a Lyapunov FunctionIEEE Access, Vol. 10 Cross Ref Data-Driven Optimal Control of Affine Systems: A Linear Programming PerspectiveIEEE Control Systems Letters, Vol. 6 Cross Ref Stochastic Dynamic Programming with Non-linear Discounting23 December 2020 | Applied Mathematics & Optimization, Vol. 84, No. 3 Cross Ref On Constructive Extractability of Measurable Selectors of Set-Valued MapsIEEE Transactions on Automatic Control, Vol. 66, No. 8 Cross Ref On the convergence of reinforcement learning with Monte Carlo Exploring StartsAutomatica, Vol. 129 Cross Ref Successive Over-Relaxation ${Q}$ -LearningIEEE Control Systems Letters, Vol. 4, No. 1 Cross Ref Affine Monotonic and Risk-Sensitive Models in Dynamic ProgrammingIEEE Transactions on Automatic Control, Vol. 64, No. 8 Cross Ref Optimal forest management under financial risk aversion with discounted Markov decision process modelsCanadian Journal of Forest Research, Vol. 49, No. 7 Cross Ref Optimizing over pure stationary equilibria in consensus stopping games2 November 2018 | Mathematical Programming Computation, Vol. 11, No. 2 Cross Ref Robust shortest path planning and semicontractive dynamic programming8 August 2016 | Naval Research Logistics (NRL), Vol. 66, No. 1 Cross Ref On the reduction of total‐cost and average‐cost MDPs to discounted MDPs25 May 2017 | Naval Research Logistics (NRL), Vol. 66, No. 1 Cross Ref Optimal Liquidation in a Level-I Limit Order Book for Large-Tick StocksAntoine Jacquier and Hao Liu5 July 2018 | SIAM Journal on Financial Mathematics, Vol. 9, No. 3AbstractPDF (845 KB)An Average Polynomial Algorithm for Solving Antagonistic Games on Graphs2 March 2018 | Journal of Computer and Systems Sciences International, Vol. 57, No. 1 Cross Ref Dynamic Programming15 February 2018 Cross Ref Dynamic Programming and Markov Decision Processes15 February 2018 Cross Ref Long-Term Values in Markov Decision Processes, (Co)Algebraically20 September 2018 Cross Ref IDENTIFICATION OF DISCRETE CHOICE DYNAMIC PROGRAMMING MODELS WITH NONPARAMETRIC DISTRIBUTION OF UNOBSERVABLES21 March 2016 | Econometric Theory, Vol. 33, No. 3 Cross Ref Dynamic Programming, Numerical15 February 2017 Cross Ref Regular Policies in Abstract Dynamic ProgrammingDimitri P. Bertsekas17 August 2017 | SIAM Journal on Optimization, Vol. 27, No. 3AbstractPDF (510 KB)Optimal Liquidation in a Level-I Limit Order Book for Large Tick StocksSSRN Electronic Journal Cross Ref Easy Affine Markov Decision Processes: TheorySSRN Electronic Journal Cross Ref Optimality of the fastest available server policy1 October 2016 | Queueing Systems, Vol. 84, No. 3-4 Cross Ref A global shooting algorithm for the facility location and capacity acquisition problem on a line with dense demandComputers & Operations Research, Vol. 71 Cross Ref Optimality of the Fastest Available Server PolicySSRN Electronic Journal Cross Ref Approximation of two-person zero-sum continuous-time Markov games with average payoff criterionOperations Research Letters, Vol. 43, No. 1 Cross Ref On variable discounting in dynamic programming: applications to resource extraction and other economic models9 August 2011 | Annals of Operations Research, Vol. 220, No. 1 Cross Ref Valuing Customer Portfolios with Endogenous Mass and Direct Marketing Interventions Using a Stochastic Dynamic Programming DecompositionMarketing Science, Vol. 33, No. 5 Cross Ref Divergence Behaviour of the Successive Geometric Mean Method of Pairwise Comparison Matrix Generation for a Multiple Stage, Multiple Objective Optimization Problem20 December 2013 | Journal of Multi-Criteria Decision Analysis, Vol. 21, No. 3-4 Cross Ref Solving multichain stochastic games with mean payoff by policy iteration Cross Ref Discounting axioms imply risk neutrality8 February 2012 | Annals of Operations Research, Vol. 208, No. 1 Cross Ref (Approximate) iterated successive approximations algorithm for sequential decision processes8 February 2012 | Annals of Operations Research, Vol. 208, No. 1 Cross Ref The multi-armed bandit, with constraints13 November 2012 | Annals of Operations Research, Vol. 208, No. 1 Cross Ref Persistently Optimal Policies in Stochastic Dynamic Programming with Generalized DiscountingMathematics of Operations Research, Vol. 38, No. 1 Cross Ref A Dynamic Game of Reputation and Economic Performances in Nondemocratic Regimes15 June 2012 | Dynamic Games and Applications, Vol. 2, No. 4 Cross Ref Stochastic mutual induction computing in Het-CoMP empowered cellular networks Cross Ref SWITCHING AND SEQUENCING AVAILABLE THERAPIES SO AS TO MAXIMIZE A PATIENT'S EXPECTED TOTAL LIFETIME16 May 2012 | International Journal of Biomathematics, Vol. 05, No. 04 Cross Ref Multigrid methods for two-player zero-sum stochastic games17 January 2012 | Numerical Linear Algebra with Applications, Vol. 19, No. 2 Cross Ref Cooperative Access Class Barring for Machine-to-Machine CommunicationsIEEE Transactions on Wireless Communications, Vol. 11, No. 1 Cross Ref PARTIALLY OBSERVABLE MARKOV DECISION PROCESSES AND PERIODIC POLICIES WITH APPLICATIONS30 April 2012 | International Journal of Information Technology & Decision Making, Vol. 10, No. 06 Cross Ref Approximate policy iteration: a survey and some new methods19 July 2011 | Journal of Control Theory and Applications, Vol. 9, No. 3 Cross Ref Total Expected Discounted Reward MDPS: Existence of Optimal Policies15 February 2011 Cross Ref Stationary policies with Markov partition propertyJournal of Statistics and Management Systems, Vol. 13, No. 6 Cross Ref Myopic Solutions of Homogeneous Sequential Decision ProcessesOperations Research, Vol. 58, No. 4-part-2 Cross Ref Partially observable Markov decision model for the treatment of early Prostate Cancer13 October 2010 | OPSEARCH, Vol. 47, No. 2 Cross Ref Computable Markov-perfect industry dynamicsThe RAND Journal of Economics, Vol. 41, No. 2 Cross Ref Dynamic Allocation of Scarce Resources Under Supply UncertaintySSRN Electronic Journal Cross Ref Economically Efficient Constitutional GovernanceSSRN Electronic Journal Cross Ref Applications of Metric Coinduction16 September 2009 | Logical Methods in Computer Science, Vol. 5, No. 3 Cross Ref Probabilistic models for optimizing patients survival ratesJournal of Interdisciplinary Mathematics, Vol. 11, No. 5 Cross Ref A multi-period TSP with stochastic regular and urgent demandsEuropean Journal of Operational Research, Vol. 185, No. 1 Cross Ref Four Canadian Contributions to Stochastic Modeling18 January 2017 | INFOR: Information Systems and Operational Research, Vol. 46, No. 1 Cross Ref Dynamic Programming5 December 2016 Cross Ref Financial intermediary's choice of borrowingApplied Economics, Vol. 40, No. 2 Cross Ref Optimal prepayment behaviourApplied Economics Letters, Vol. 14, No. 15 Cross Ref A structured pattern matrix algorithm for multichain Markov decision processes6 February 2007 | Mathematical Methods of Operations Research, Vol. 66, No. 3 Cross Ref Incomplete markets, labor supply and capital accumulationJournal of Monetary Economics, Vol. 54, No. 8 Cross Ref VARIATIONS ON THE THEME OF CONNING IN MATHEMATICAL ECONOMICSJournal of Economic Surveys, Vol. 21, No. 3 Cross Ref Commercial loan borrower’s optimal borrowing and prepayment decisions under uncertaintyApplied Economics, Vol. 39, No. 8 Cross Ref Risk-Sensitive and Risk-Neutral Multiarmed BanditsMathematics of Operations Research, Vol. 32, No. 2 Cross Ref Computable Markov-Perfect Industry Dynamics: Existence, Purification, and MultiplicitySSRN Electronic Journal Cross Ref Semi-Markov information model for revenue management and dynamic pricing9 March 2006 | OR Spectrum, Vol. 29, No. 1 Cross Ref A Turnpike Theorem For A Risk-Sensitive Markov Decision Process with StoppingEric V. Denardo and Uriel G. Rothblum26 July 2006 | SIAM Journal on Control and Optimization, Vol. 45, No. 2AbstractPDF (189 KB)Discounting and Risk NeutralitySSRN Electronic Journal Cross Ref Myopic Solutions of Homogeneous Sequential Decision ProcessesSSRN Electronic Journal Cross Ref Limited Attention as a Bounded on RationalitySSRN Electronic Journal Cross Ref Approximation solution and suboptimality for discounted semi-markov decision problems with countable state spaceOptimization, Vol. 53, No. 4 Cross Ref Optimal threshold probability in undiscounted Markov decision processes with a target setApplied Mathematics and Computation, Vol. 149, No. 2 Cross Ref Index Policies for Stochastic Search in a Forest with an Application to R&D Project ManagementMathematics of Operations Research, Vol. 29, No. 1 Cross Ref Recursive methods in probability control Cross Ref Optimism and overconfidence in searchReview of Economic Dynamics, Vol. 7, No. 1 Cross Ref Nonclassical Brock-Mirman EconomiesSSRN Electronic Journal Cross Ref Optimal policies in continuous time inventory control models with limited supplyComputers & Mathematics with Applications, Vol. 46, No. 7 Cross Ref Existence and Uniqueness of Solutions to the Bellman Equation in the Unbounded CaseEconometrica, Vol. 71, No. 5 Cross Ref Dynamic Airline Revenue Management with Multiple Semi-Markov DemandOperations Research, Vol. 51, No. 1 Cross Ref Finite State and Action MDPS Cross Ref Dynamic Programming Cross Ref Incomplete Markets, Labor Supply and Capital AccumulationSSRN Electronic Journal Cross Ref Overconfidence in SearchSSRN Electronic Journal Cross Ref Constrained Discounted Semi-Markov Decision Processes Cross Ref Controlled Markov Chains with Utility Functions Cross Ref Total Reward Criteria Cross Ref Is There a Curse of Dimensionality for Contraction Fixed Points in the Worst Case?Econometrica, Vol. 70, No. 1 Cross Ref SET-VALUED CONTROL LAWS IN TEV-DC CONTROL PROBLEMSIFAC Proceedings Volumes, Vol. 35, No. 1 Cross Ref Dynamic economic management of soil erosion, nutrient depletion, and productivity in the north central USA1 January 2001 | Land Degradation & Development, Vol. 12, No. 4 Cross Ref On Markov Policies for Minimax Decision ProcessesJournal of Mathematical Analysis and Applications, Vol. 253, No. 1 Cross Ref Recursive method in stochastic optimization under compound criteria Cross Ref Kulatilaka '93: The Case of a Dual Fuel Boiler: A Review, Gauss Codes and Numerical ExamplesSSRN Electronic Journal Cross Ref Kulatilaka '88 as a CVP Analysis in a Real Option Framework: A Review, Gauss Codes and Numerical ExamplesSSRN Electronic Journal Cross Ref A stochastic programming approach to manufacturing flow controlIIE Transactions, Vol. 32, No. 10 Cross Ref Chapter 5 Numerical solution of dynamic economic models Cross Ref A Theory of Constitutional Standards and Civil LibertySSRN Electronic Journal Cross Ref The one-sector growth model with idiosyncratic shocks: Steady states and dynamicsJournal of Monetary Economics, Vol. 39, No. 3 Cross Ref Pansystems optimization, generalized principles of optimality, and fundamental equations of dynamic programmingKybernetes, Vol. 26, No. 3 Cross Ref Introduction Cross Ref Stochastic Inventory Models with Limited Production Capacity and Periodically Varying Parameters27 July 2009 | Probability in the Engineering and Informational Sciences, Vol. 11, No. 1 Cross Ref A Comparison of Policy Iteration Methods for Solving Continuous-State, Infinite-Horizon Markovian Decision Problems Using Random, Quasi-random, and Deterministic DiscretizationsSSRN Electronic Journal Cross Ref On the value function in constrained control of Markov chainsMathematical Methods of Operations Research, Vol. 44, No. 3 Cross Ref Models for capacity acquisition decisions Journal of Systems, Vol. No. 3 Cross Ref capital and in of & Vol. No. 2 Cross Ref approximations for the control of a Journal of Operational Research, Vol. No. 1 Cross Ref Chapter 14 Numerical dynamic programming in Cross Ref The for Vol. 38, No. 1 Cross Ref A model of with limited Theory, Vol. 5, No. 1 Cross Ref control under Transactions on Automatic Control, Vol. 40, No. 2 Cross Ref discounted and undiscounted Markov Decision Problems June Cross Ref Game Models of Management Cross Ref Learning to dynamic Vol. No. Cross Ref AND Economic Vol. 33, No. Cross Ref May Cross Ref of linear programming for and Markovian control Methods and Models of Operations Research, Vol. 40, No. 1 Cross Ref and of equilibria in stochastic of Economic and Control, Vol. No. 2 Cross Ref optimal control of Journal of Operational Research, Vol. No. 2 Cross Ref Chapter of decision processes Cross Ref A generalized of the Theory, Vol. No. 1 Cross Ref Some structured dynamic in & Mathematics with Applications, Vol. No. Cross Ref Policy iteration and methods for Markov decision processes under average & Mathematics with Applications, Vol. No. Cross Ref Optimal control of a facility with of Optimization Theory and Applications, Vol. No. 3 Cross Ref A of in Management Cross Ref approach to dynamic of Mathematical Economics, Vol. 21, No. 1 Cross Ref Turnpike for a of in manufacturing flow of Operations Research, Vol. 29, No. 1 Cross Ref Optimal of a with Journal of Operational Research, Vol. No. 2 Cross Ref Dynamic programming and for of Mathematical Analysis and Applications, Vol. No. 1 Cross Ref optimal algorithm for stochastic Transactions on Automatic Control, Vol. No. 8 Cross Ref A algorithm for Control Applications and Vol. 12, No. 1 Cross Ref Deterministic and Games with Cross Ref for Stochastic Games Cross Ref Markovian Decision J. July 2006 | SIAM Journal on Control and Optimization, Vol. No. and in and Production Economics, Vol. 19, No. Cross Ref Recursive and the of Economic Theory, Vol. No. 2 Cross Ref Fixed for of of Mathematical Analysis and Applications, Vol. No. 1 Cross Ref Chapter 8 Markov decision processes Cross Ref Optimal Policies and the of April 2012 | The Journal of Vol. 44, No. 5 Cross Ref Controlled semi-markov models the discounted of and Vol. 21, No. 3 Cross Ref A for the solution of time horizon decision Stochastic Models and Analysis, Vol. 4, No. 4 Cross Ref under and state Research Vol. 35, No. 5 Cross Ref Sequential equilibria in two-person of Optimization Theory and Applications, Vol. No. 1 Cross Ref of Discrete Control August 2006 | SIAM Journal on Control and Optimization, Vol. 26, No. of linear programming to discounted Markovian decision Vol. 10, No. 3 Cross Ref Contraction undiscounted Markov decision of Mathematical Analysis and Applications, Vol. No. 1 Cross Ref Solving Markovian decision processes by successive of of Mathematical Analysis and Applications, Vol. No. 2 Cross Ref The of and A Analysis Cross Ref On the Existence of Sequential in Markov Games Cross Ref Optimality for continuous time with Markov Application to an planning January 2006 Cross Ref Applications of methods to and Vol. 51, No. 6 Cross Ref On dynamic strategies in horizon models the of Economic & Vol. No. 3 Cross Ref Abstract Dynamic Programming Models under and H. July 2006 | SIAM Journal on Control and Optimization, Vol. No. A stochastic control October 2007 | Optimal Control Applications and Vol. No. 3 Cross Ref for dynamic programming with of Optimization Theory and Applications, Vol. 54, No. 1 Cross Ref on the of a of Mathematical Analysis and Applications, Vol. No. 2 Cross Ref on the of a Finite Markov July 2009 | Probability in the Engineering and Informational Sciences, Vol. No. 1 Cross Ref Dynamic Programming and Markov Decision November 2016 Cross Ref Optimal policies for in Stochastic Vol. No. 2 Cross Ref On the of capital of Economic Theory, Vol. 40, No. 1 Cross Ref in Markov decision of Mathematical Analysis and Applications, Vol. No. 2 Cross Ref Fixed for discounted finite decision of Mathematical Analysis and Applications, Vol. No. 2 Cross Ref Some new mathematical methods in dynamic programming over Vol. 9, No. 1 Cross Ref Approximation and in dynamic Transactions on Automatic Control, Vol. No. 3 Cross Ref On the of in Discounted Stochastic Dynamic Games Cross Ref Optimal decisions over time and an by the Bellman Vol. 7, No. Cross Ref Reward for Markov decision processes Cross Ref MARKOV DECISION Vol. 39, No. 2 Cross Ref optimal policies in inventory models with continuous July 2016 | in Applied Vol. No. 2 Cross Ref Finite state for average state Markov decision March | Vol. 7, No. 1 Cross Ref for a discounted Markov decision Processes and Applications, Vol. 19, No. 1 Cross Ref A survey on in Vol. No. 2 Cross Ref A Fixed to Markov and P. J. July 2006 | SIAM Journal on Discrete Vol. 5, No. policy iteration Research Letters, Vol. No. 5 Cross Ref Stochastic Production with Production S. P. R. and N. August 2006 | SIAM Journal on Control and Optimization, Vol. No. policies in dynamic programming: Linear programming suboptimality and Programming, Vol. No. 1 Cross Ref Optimal and control of with Research Logistics Vol. No. 2 Cross Ref Dynamic I. February 2012 | SIAM Journal on Control and Optimization, Vol. 21, No. 3AbstractPDF Optimal Control of Partially Semi-Markov Processes the Infinite Discounted Cross Ref of observable Markov decision processes linear of Mathematical Analysis and Applications, Vol. No. 1 Cross Ref A of & Letters, Vol. 12, No. 3-4 Cross Ref Optimization of networks Markov Operations Research, Vol. 26, No. 1 Cross Ref The of discounted Markov decision July 2016 | Journal of Applied Vol. 19, No. 04 Cross Ref The of discounted Markov decision July 2016 | Journal of Applied Vol. 19, No. 4 Cross Ref the in with de de Vol. 33, No. 3 Cross Ref A of inventory of Mathematical Analysis and Applications, Vol. No. 1 Cross Ref for the structure of optimal strategies in dynamic of Optimization Theory and Applications, Vol. No. 3 Cross Ref Finite state approximations for state horizon discounted Markov decision processes with of Mathematical Analysis and Applications, Vol. No. 1 Cross Ref Solving equations by March 2011 | Operations Research, Vol. No. 2 Cross Ref the discounted return in and semi-markov Research Logistics Vol. No. 4 Cross Ref optimal policies for Research Logistics Vol. No. 3 Cross Ref Optimal control of Research Logistics Vol. No. 3 Cross Ref A of the of Mathematical Analysis and Applications, Vol. No. 1 Cross Ref replacement with a Markovian July 2016 | Journal of Applied Vol. No. Cross Ref replacement with a Markovian July 2016 | Journal of Applied Vol. No. 3 Cross Ref optimal policies for structured Markov decision Journal of Operational Research, Vol. 7, No. 4 Cross Ref Markov decision problems with of Optimization Theory and Applications, Vol. No. 2 Cross Ref stopping July 2016 | Journal of Applied Vol. No. Cross Ref stopping July 2016 | Journal of Applied Vol. No. 2 Cross Ref On the convergence of successive approximations in dynamic programming with Operations Research, Vol. No. 3 Cross Ref and in generalized Research Logistics Vol. No. 1 Cross Ref Optimality in and linear Programming, Vol. No. 1 Cross Ref Economic of of Economics and Vol. 7, No. 4 Cross Ref Optimal sequential and resource under July 2016 | in Applied Vol. 12, No. 04 Cross Ref Stochastic optimal The time Transactions on Automatic Control, Vol. No. 6 Cross Ref Optimal sequential and resource under July 2016 | in Applied Vol. 12, No. 4 Cross Ref Improved of the discounted return in Markov and Operations Research, Vol. No. 5 Cross Ref Discounted Stochastic R. and P. July 2006 | SIAM Journal on Discrete Vol. No. 2AbstractPDF KB)Optimal policies for the Research Logistics Vol. 27, No. 1 Cross Ref Optimal policies for Research Logistics Vol. 27, No. 1 Cross Ref approximations for discounted Markov decision of Mathematical Analysis and Applications, Vol. No. 1 Cross Ref and Approximation of Sequential July 2006 | SIAM Journal on Control and Optimization, Vol. No. June 2007 | Optimization, Vol. 11, No. 1 Cross Ref A method of for discounted Markov decision Operations Research, Vol. No. 7 Cross Ref A for approximations of Markov of Mathematical Analysis and Applications, Vol. 71, No. 1 Cross Ref Steady State Policies for Deterministic Dynamic July 2006 | SIAM Journal on Applied Mathematics, Vol. No. KB)Optimal policies for a of stochastic Research Logistics Vol. 26, No. 2 Cross Ref in a Generalized Markov Decision J. July 2006 | SIAM Journal on Control and Optimization, Vol. No. 2AbstractPDF and for Markov decision problems with the May | Vol. No. 1 Cross Ref Geometric convergence of in multichain Markov decision July 2016 | in Applied Vol. 11, No. Cross Ref Geometric convergence of in multichain Markov decision July 2016 | in Applied Vol. 11, No. 1 Cross Ref A survey of for some of Markov decision problems Cross Ref Successive approximations for Markov decision processes and Markov games with Optimization, Vol. 10, No. 3 Cross Ref Markov decision processes and Processes and Applications, Vol. No. 1 Cross Ref Contraction undiscounted Markov decision of Mathematical Analysis and Applications, Vol. No. 3 Cross Ref A Stochastic Game of a July 2006 | SIAM Journal on Control and Optimization, Vol. No. 3AbstractPDF KB)Optimal policies for a replacement and Control, Vol. No. 1 Cross Ref A zero-sum stochastic game model of Journal of Game Theory, Vol. 7, No. 1 Cross Ref AND IN MARKOV DECISION A Cross Ref THE OF by Cross Ref DYNAMIC PROGRAMMING IN by Cross Ref DYNAMIC by Cross Ref ON OF DYNAMIC Cross Ref OF DYNAMIC PROGRAMMING Cross Ref Cross Ref Successive for stochastic Cross Ref Mappings with Application in Dynamic ProgrammingDimitri P. July 2006 | SIAM Journal on Control and Optimization, Vol. No. 3AbstractPDF programming by successive approximations with to of Mathematical Analysis and Applications, Vol. 58, No. 2 Cross Ref On the Optimality of Policies in Decision and M. and L. July 2006 | SIAM Journal on Applied Mathematics, Vol. 32, No. 2AbstractPDF and Markov Programming Cross Ref Theory and Markovian Decision Chains Cross Ref A of successive methods for discounted Markovian decision Operations Research, Vol. No. 5 Cross Ref The on optimal of in labor in the of Economic Theory, Vol. 13, No. 1 Cross Ref On for successive Transactions on Automatic Control, Vol. 21, No. 3 Cross Ref The on optimal of in labor in the May Cross Ref Cross Ref A of the of the in Dynamic July 2007 | A Transactions, Vol. No. 1 Cross Ref of in dynamic Transactions on Automatic Control, Vol. No. 3 Cross Ref of some of dynamic and Control, Vol. 27, No. 4 Cross Ref and finite state Markovian decision of Mathematical Analysis and Applications, Vol. 49, No. 3 Cross Ref Discounted decision linear programming and policy Vol. 29, No. 1 Cross Ref On the of Dynamic Programming Cross Ref Introduction to Dynamic from E. V. Denardo and L. G. Mitten, of Sequential Decision Journal of Engineering Cross Ref in Operations Research, Vol. No. 3 Cross Ref Optimal capital under of Economic Theory, Vol. No. 2 Cross Ref of optimization problems and decision of Computer and Sciences, Vol. No. 1 Cross Ref A Class of Markovian Decision Processes Cross Ref Optimal Control of Queueing Systems Cross Ref of dynamic of Mathematical Analysis and Applications, Vol. 43, No. 3 Cross Ref stochastic July 2016 | Journal of Applied Vol. 10, No. Cross Ref stochastic July 2016 | Journal of Applied Vol. 10, No. 3 Cross Ref Optimal policies for a in to stochastic Research Logistics Vol. No. 2 Cross Ref information and decision and Vol. 9, No. 5 Cross Ref decision Research Logistics Vol. No. 1 Cross Ref dynamic of Optimization Theory and Applications, Vol. 11, No. 3 Cross Ref of a Markovian decision problem by successive Operations Research, Vol. No. 1 Cross Ref the of in Vol. No. 1 Cross Ref for optimization and Control, Vol. 21, No. 5 Cross Ref approximations to dynamic of Mathematical Analysis and Applications, Vol. No. 3 Cross Ref On a of optimal policies in continuous time Markovian decision of Mathematical Analysis and Applications, Vol. No. 1 Cross Ref Applications of Metric Cross Ref games Cross Ref Dynamic programming of stochastic networks with Cross Ref and in decision processes Cross Ref optimal algorithm for continuous state time stochastic control Cross Ref games Cross Ref approach to optimization of discounted stochastic continuous-time Cross Ref Finite state continuous time Markov decision processes with an planning of Mathematical Analysis and Applications, Vol. No. 3 Cross Ref Markov V. Denardo and B. L. July 2006 | SIAM Journal on Applied Mathematics, Vol. No. 3AbstractPDF of Stationary Optimal Policies for Some Markov July 2006 | SIAM Review, Vol. 9, No. 3AbstractPDF Programming by Linear August 2006 | SIAM Journal on Applied Mathematics, Vol. 14, No. 9, August July 2006 for and Applied & for and Applied Mathematics

  • Dissertation
  • Cite Count Icon 1
  • 10.14264/uql.2020.236
Tractable POMDP-planning for robots with complex non-linear dynamics
  • Mar 16, 2020
  • Marcus Hoerger

Planning under partial observability is an essential capability of autonomous robots. While robots operate in the real world, they are inherently subject to various uncertainties such a control and sensing errors, and limited information regarding the operating environment.Conceptually these type of planning problems can be solved in a principled manner when framed as a Partially Observable Markov Decision Process (POMDP). POMDPs model the aforementioned uncertainties as conditional probability functions and estimate the state of the system as probability functions over the state space, called beliefs. Instead of computing the best strategy with respect to single states, POMDP solvers compute the best strategy with respect to beliefs. Solving a POMDP exactly is computationally intractable in general.However, in the past two decades we have seen tremendous progress in the development of approximately optimal solvers that trade optimality for computational tractability. Despite this progress, approximately solving POMDPs for systems with complex non-linear dynamics remains challenging. Most state-of-the-art solvers rely on a large number of expensive forward simulations of the system to find an approximate-optimal strategy. For systems with complex non-linear dynamics that admit no closed-form solution, this strategy can become prohibitively expensive. Another difficulty in applying POMDPs to physical robots with complex transition dynamics is the fact that almost all implementations of state-of-the-art on-line POMDP solvers restrict the user to specific data structures for the POMDP model, and the model has to be hard-coded within the solver implementation. This, in turn, severely hinders the process of applying POMDPs to physical robots.In this thesis we aim to make POMDPs more practical for realistic robotic motion planning tasks under partial observability. We show that systematic approximations of complex, non-linear transition dynamics can be used to design on-line POMDP solvers that are more efficient than current solvers. Furthermore, we propose a new software-framework that supports the user in modeling complex planning problems under uncertainty with minimal implementation effort.

  • Conference Article
  • Cite Count Icon 2
  • 10.1109/fuzzy.2010.5584614
A Bayesian game based adaptive fuzzy controller for multiagent POMDPs
  • Jul 1, 2010
  • Rajneesh Sharma + 1 more

This paper develops a novel fuzzy reinforcement learning (RL) based controller for multiagent partially observable Markov decision processes (POMDPs) modeled as a sequence of Bayesian games. Multiagent POMDPs have emerged as a powerful framework for modeling and optimizing multiagent sequential decision making problems under uncertainty, but finding optimal policies is computationally very challenging. Our aim here is twin fold, (i) introduction of a learning paradigm in infinite horizon multiagent POMDPs and (ii) scaling up multiagent POMDP solution approaches by introduction of fuzzy inference systems (FIS) based generalization. We introduce what may be called fuzzy multiagent POMDPs to overcome space and time complexity issues involved in finding optimal policies for multiagent POMDPs. The proposed FIS based RL controller approximates optimal policies for multiagent POMDPs modeled as a sequence of Bayesian games. We empirically evaluate the proposed fuzzy multiagent POMDP controller on the standard benchmark multiagent tiger problem and compare its performance against other state-of-the-art multiagent POMDP solution approaches. Results showcase the effectiveness of the proposed approach and validate the feasibility of employing Bayesian game based RL (in conjunction with FIS approximation) for addressing the intractability of multiagent POMDPs.

  • Book Chapter
  • Cite Count Icon 2
  • 10.1017/cbo9781316471104.010
Partially observed Markov decision processes (POMDPs)
  • Jan 1, 2016
  • Vikram Krishnamurthy

A POMDP is a controlled HMM. Recall from §2.4 that an HMM consists of an X -state Markov chain { x k } observed via a noisy observation process { y k }. Figure 7.1 displays the schematic setup of a POMDP where the action u k affects the state and/or observation (sensing) process of the HMM. The HMM filter (discussed extensively in Chapter 3) computes the posterior distribution π k of the state. The posterior π k is called the belief state . In a POMDP, the stochastic controller depicted in Figure 7.1 uses the belief state to choose the next action. This chapter is organized as follows. §7.1 describes the POMDP model. Then §7.2 gives the belief state formulation and the Bellman's dynamic programming equation for the optimal policy of a POMDP. It is shown that a POMDP is equivalent to a continuous-state MDP where the states are belief states (posteriors). Bellman's equation for continuous-state MDP was discussed in §6.3. §7.3 gives a toy example of a POMDP. Despite being a continuous-state MDP, §7.4 shows that for finite horizon POMDPs, Bellman's equation has a finite dimensional characterization. §7.5 discusses several algorithms that exploit this finite dimensional characterization to compute the optimal policy. §7.6 considers discounted cost infinite horizon POMDPs. As an example of a POMDP, optimal search of a moving target is discussed in §7.7. Finite horizon POMDP A POMDP model with finite horizon N is a 7-tuple ( X , U , Y , P ( u ), B ( u ), c ( u ), c N ). Partially observed Markov decision process (POMDP) schematic setup. The Markov system together with noisy sensor constitute a hidden Markov model (HMM). The HMM filter computes the posterior (belief state) π k of the state of the Markov chain. The controller (decision-maker) then chooses the action u k at time k based on π k . 1. X = {1, 2, …, X } denotes the state space and x k ∈ X denotes the state of a controlled Markov chain at time k = 0, 1, …, N . 2. U = {1, 2, …, U } denotes the action space with u k ∈ U denoting the action chosen at time k by the controller.

  • Research Article
  • Cite Count Icon 2
  • 10.1613/jair.1.14525
Optimality Guarantees for Particle Belief Approximation of POMDPs
  • Aug 27, 2023
  • Journal of Artificial Intelligence Research
  • Michael H Lim + 4 more

Partially observable Markov decision processes (POMDPs) provide a flexible representation for real-world decision and control problems. However, POMDPs are notoriously difficult to solve, especially when the state and observation spaces are continuous or hybrid, which is often the case for physical systems. While recent online sampling-based POMDP algorithms that plan with observation likelihood weighting have shown practical effectiveness, a general theory characterizing the approximation error of the particle filtering techniques that these algorithms use has not previously been proposed. Our main contribution is bounding the error between any POMDP and its corresponding finite sample particle belief MDP (PB-MDP) approximation. This fundamental bridge between PB-MDPs and POMDPs allows us to adapt any sampling-based MDP algorithm to a POMDP by solving the corresponding particle belief MDP, thereby extending the convergence guarantees of the MDP algorithm to the POMDP. Practically, this is implemented by using the particle filter belief transition model as the generative model for the MDP solver. While this requires access to the observation density model from the POMDP, it only increases the transition sampling complexity of the MDP solver by a factor of O(C), where C is the number of particles. Thus, when combined with sparse sampling MDP algorithms, this approach can yield algorithms for POMDPs that have no direct theoretical dependence on the size of the state and observation spaces. In addition to our theoretical contribution, we perform five numerical experiments on benchmark POMDPs to demonstrate that a simple MDP algorithm adapted using PB-MDP approximation, Sparse-PFT, achieves performance competitive with other leading continuous observation POMDP solvers.

  • Conference Article
  • 10.24963/ijcai.2024/953
Optimality Guarantees for Particle Belief Approximation of POMDPs (Abstract Reprint)
  • Aug 1, 2024
  • Michael Lim + 4 more

Partially observable Markov decision processes (POMDPs) provide a flexible representation for real-world decision and control problems. However, POMDPs are notoriously difficult to solve, especially when the state and observation spaces are continuous or hybrid, which is often the case for physical systems. While recent online sampling-based POMDP algorithms that plan with observation likelihood weighting have shown practical effectiveness, a general theory characterizing the approximation error of the particle filtering techniques that these algorithms use has not previously been proposed. Our main contribution is bounding the error between any POMDP and its corresponding finite sample particle belief MDP (PB-MDP) approximation. This fundamental bridge between PB-MDPs and POMDPs allows us to adapt any sampling-based MDP algorithm to a POMDP by solving the corresponding particle belief MDP, thereby extending the convergence guarantees of the MDP algorithm to the POMDP. Practically, this is implemented by using the particle filter belief transition model as the generative model for the MDP solver. While this requires access to the observation density model from the POMDP, it only increases the transition sampling complexity of the MDP solver by a factor of O(C), where C is the number of particles. Thus, when combined with sparse sampling MDP algorithms, this approach can yield algorithms for POMDPs that have no direct theoretical dependence on the size of the state and observation spaces. In addition to our theoretical contribution, we perform five numerical experiments on benchmark POMDPs to demonstrate that a simple MDP algorithm adapted using PB-MDP approximation, Sparse-PFT, achieves performance competitive with other leading continuous observation POMDP solvers.

  • Dissertation
  • Cite Count Icon 10
  • 10.14711/thesis-b710758
Algorithms for partially observable Markov decision processes
  • Dec 23, 2014
  • Weihong Zhang

Partially Observable Markov Decision Process (POMDP) is a general sequential decision-making model where the effects of actions are nondeterministic and only partial information about world states is available. However, finding near optimal solutions for POMDPs is computationally difficult. Value iteration is a standard algorithm for solving POMDPs. It conducts a sequence of dynamic programming (DP) updates to improve value functions. Value iteration is inefficient for two reasons. First, a DP update is expensive due to the need of accounting for all belief states in a continuous belief space. Second, value iteration needs to conduct a large number of DP updates before its convergence. This thesis investigates two ways to accelerate value iteration. The work presented centers around the idea of conducting DP updates and therefore value iteration over a belief subspace, a subset of belief space. The first use of belief subspace is to reduce the number of DP updates for value iteration to converge. We design a computationally cheap procedure considering a belief subspace which consists of a finite number of belief states. It is used as an additional step for improving value functions. Due to additional improvements by the procedure, value iteration conducts fewer DP updates and therefore is more efficient. The second use of belief subspace is to reduce the complexity of DP updates. We establish a framework on how to carry out value iteration over a belief subspace determined by a POMDP model. Whether the belief subspace is smaller than the belief space is model dependent. If this is true for a POMDP, value iteration over the belief subspace is expected to be more efficient. Based on this framework, we study three POMDP classes with special problem characteristics and propose different value iteration algorithms for them. (1) An informative POMDP assumes that an agent always has a good idea about the world states. The subspace determined by the model is much smaller than the belief space. Value iteration over the belief subspace is more efficient for this POMDP class. (2) A near-discernible POMDP assumes that the agent can get a good idea about states once in a while if it executes some particular actions. For such a POMDP, the belief subspace determined by the model can be of the same size as the belief space. We propose an anytime value iteration algorithm which focuses the computations on a small belief subspace and gradually expand it. (3) A more general class than near-discernible POMDPs assumes that the agent can get a good idea about states with a high likelihood once in a while if it executes some particular actions. For such POMDPs, we adapt the anytime algorithm to conduct value iteration over a growing belief subspace.

  • Conference Article
  • Cite Count Icon 25
  • 10.5555/1838206.1838386
Closing the learning-planning loop with predictive state representations
  • May 10, 2010
  • Byron Boots + 2 more

A central problem in artificial intelligence is to plan to maximize future reward under uncertainty in a partially observable environment. Models of such environments include Partially Observable Markov Decision Processes (POMDPs) [4] as well as their generalizations, Predictive State Representations (PSRs) [9] and Observable Operator Models (OOMs) [7]. POMDPs model the state of the world as a latent variable; in contrast, PSRs and OOMs represent state by tracking occurrence probabilities of a set of future events (called tests or characteristic events) conditioned on past events (called histories or indicative events). Unfortunately, exact planning algorithms such as value iteration [14] are intractable for most realistic POMDPs due to the curse of history and the curse of dimensionality [11]. However, PSRs and OOMs hold the promise of mitigating both of these curses: first, many successful approximate planning techniques designed to address these problems in POMDPs can easily be adapted to PSRs and OOMs [8, 6]. Second, PSRs and OOMs are often more compact than their corresponding POMDPs (i.e., need fewer state dimensions), mitigating the curse of dimensionality. Finally, since tests and histories are observable quantities, it has been suggested that PSRs and OOMs should be easier to learn than POMDPs; with a successful learning algorithm, we can look for a model which ignores all but the most important components of state, reducing dimensionality still further.In this paper we take an important step toward realizing the above hopes. In particular, we propose and demonstrate a fast and statistically consistent spectral algorithm which learns the parameters of a PSR directly from sequences of action-observation pairs. We then close the loop from observations to actions by planning in the learned model and recovering a policy which is near-optimal in the original environment. Closing the loop is a much more stringent test than simply checking short-term prediction accuracy, since the quality of an optimized policy depends strongly on the accuracy of the model: inaccurate models typically lead to useless plans.

  • Conference Article
  • Cite Count Icon 3
  • 10.1109/ieem.2010.5674294
Optimal maintenance policies for three-states POMDP with quality measurement errors
  • Dec 1, 2010
  • Mohammad M Aldurgam + 1 more

Partially Observed Markov Decision Process (POMDP) has been used to model decision making under uncertainty in several areas. A few areas of application include: manufacturing, healthcare, business and military applications. In the POMDP context, systems are considered as multi-state systems with hidden states. The common thing among all POMDP models is the existence of measurements utilized to infer about the actual hidden state of the system on hand. However, measurements, in general, are not error free. The impact of measurement errors on the POMDP optimal decision polices is formulated and studied for a three-state deteriorating machine with two quality outcomes and possible quality measurement errors. The decision making problem is modeled as a Three-Layers Hidden Markov Decision Process (TLHMDP). The objective function of the POMDP problem is shown to be a piecewise linear convex one. The impact of measurement errors in the POMDP context is demonstrated by numerical example.

  • Research Article
  • Cite Count Icon 3
  • 10.14288/1.0051546
Decision theoretic learning of human facial displays and gestures
  • Jan 1, 2004
  • James J Little + 1 more

We present a vision-based, adaptive, decision-theoretic model of human facial displays and gestures in interaction. Changes in the human face occur due to many factors, including communication, emotion, speech, and physiology. Most systems for facial expression analysis attempt to recognize one or more of these factors, resulting in a machine whose inputs are video sequences or static images, and whose outputs are, for example, basic emotion categories. Our approach is fundamentally different. We make no prior commitment to some particular recognition task. Instead, we consider that the meaning of a facial display for an observer is contained in its relationship to actions and outcomes. Agents must distinguish facial displays according to their affordances, or how they help an agent to maximize utility. To this end, our system learns relationships between the movements of a person's face, the context in which they are acting, and a utility function. The model is a partially observable Markov decision process, or POMDP. The video observations are integrated into the POMDP using a dynamic Bayesian making at the high level. The parameters of the model are learned from training data using an a-posteriori constrained optimization technique based on the expectation-maximization algorithm. The training does not require labeled data, since we do not train classifiers for individual facial actions, and then integrate them into the model. Rather, the learning process discovers clusters of facial motions and their relationship to the context automatically. As such, it can be applied to any situation in which non-verbal gestures are purposefully used in a task. We present an experimental paradigm in which we record two humans playing a collaborative game, or a single human playing against an automated agent, and learn the human behaviors. We use the resulting model to predict human actions. We show results on three simple games.

  • Conference Article
  • Cite Count Icon 5
  • 10.1109/cdc.2013.6760790
Optimality conditions for total-cost Partially Observable Markov Decision Processes
  • Dec 1, 2013
  • Eugene A Feinberg + 2 more

This note describes sufficient conditions for the existence of optimal policies for Partially Observable Markov Decision Processes (POMDPs). The objective criterion is either minimization of total discounted costs or minimization of total nonnegative costs. It is well-known that a POMDP can be reduced to a Completely Observable Markov Decision Process (COMDP) with the state space being the sets of believe probabilities for the POMDP. Thus, a policy is optimal in POMDP if and only if it corresponds to an optimal policy in the COMDP. Here we provide sufficient conditions for the existence of optimal policies for COMDP and therefore for POMDP. In particular, we consider POMDPs with weakly continuous transition probabilities and bounded below K-infcompact cost functions. For a fully observable MDPs these two conditions guarantee the following three properties: (i) validity of finite-horizon and infinite-horizon optimality equations, (ii) convergence of value iterations to infinite-horizon value functions, (iii) existence of stationary optimal policies. We show that the single additional assumption, that the observation transition probability is continuous in the total variation, implies properties (i)-(iii) for the COMDP. Therefore, this condition also implies the existence of optimal policies for POMDPs. We also provide a more general and less constructive sufficient condition for the validity of (i)-(iii) for the COMDP and therefore for the existence of optimal policies for a POMDP and the possibility of finding them by transforming optimal policies for the corresponding COMDP.

  • Conference Article
  • Cite Count Icon 1
  • 10.1109/rasse53195.2021.9686829
Model-Based Performance Evaluation of Safety-Critical POMDPs
  • Dec 12, 2021
  • Parisa Pouya + 4 more

Partially Observable Markov Decision Processes (POMDPs) have been successfully employed for planning and control in safety-critical applications (e.g., autonomous vehicles) with uncertain environments. POMDP development is a subjective process and depends on assumptions inferred from available information from system-environment interactions. This subjective process can result in different designs (e.g., different state-spaces) where one needs to analyze their performance and robustness to choose the POMDP that best satisfies safety and performance requirements. The robustness and performance depend on accurately inferring states and providing optimal and safe responses in presence of uncertainties, such that the goal can be achieved without violating safety requirements. These properties are typically evaluated by extensive, end-to-end testing of the developed POMDPs in simulated environments and measuring their average performance in simulated scenarios, where the measured performance relies entirely on the end results (e.g., crash or no crash) obtained from the simulated scenarios. To avoid this suboptimal process, we propose a model-based, probabilistic technique to evaluate performance and robustness of a class of POMDPs, where states are designed to represent various high-level situations in the environment, including both the goal and failure states. In this technique, the robustness and performance of designed POMDPs are evaluated by mapping POMDPs to their belief-space and estimating the extreme and expected probability of transitioning to failure states. Finally, we employ our technique to compare and evaluate two different POMDPs designed for controlling an AV in a safety-critical use-case scenario (lane-keeping with risky situations and corner-cases). By comparing the results obtained from our technique to end-to-end simulation-based evaluation, we show that the proposed technique can correctly identify the POMDP with best performance.

  • Research Article
  • Cite Count Icon 7
  • 10.1109/tase.2021.3057111
Online Partial Conditional Plan Synthesis for POMDPs With Safe-Reachability Objectives: Methods and Experiments
  • Jul 1, 2021
  • IEEE Transactions on Automation Science and Engineering
  • Yue Wang + 4 more

The framework of partially observable Markov decision processes (POMDPs) offers a standard approach to model uncertainty in many robot tasks. Traditionally, POMDPs are formulated with optimality objectives. In this article, we study a different formulation of POMDPs with <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">Boolean objectives</i> . For robotic domains that require a correctness guarantee of accomplishing tasks, Boolean objectives are natural formulations. We investigate the problem of POMDPs with a common Boolean objective: <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">safe reachability</i> , requiring that the robot eventually reaches a goal state with a probability above a threshold while keeping the probability of visiting unsafe states below a different threshold. Our approach builds upon the previous work that represents POMDPs with Boolean objectives using symbolic constraints. We employ a satisfiability modulo theories (SMTs) solver to efficiently search for solutions, i.e., policies or conditional plans that specify the action to take contingent on every possible event. A full policy or conditional plan is generally expensive to compute. To improve computational efficiency, we introduce the notion of <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">partial conditional plans</i> that cover sampled events to approximate a full conditional plan. Our approach constructs a partial conditional plan parameterized by a <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">replanning probability</i> . We prove that the failure rate of the constructed partial conditional plan is bounded by the replanning probability. Our approach allows users to specify an appropriate bound on the replanning probability to balance efficiency and correctness. Moreover, we update this bound properly to quickly detect whether the current partial conditional plan meets the bound and avoid unnecessary computation. In addition, to further improve the efficiency, we cache partial conditional plans for sampled belief states and reuse these cached plans if possible. We validate our approach in several robotic domains. The results show that our approach outperforms a previous policy synthesis approach for POMDPs with safe-reachability objectives in these domains. <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">Note to Practitioners</i> —This article was motivated by two observations. On the one hand, in robotics applications where uncertainty in sensing and actions is present, the solution to the classical partially observable Markov decision process (POMDP) formulation is expensive to compute in general. On the other hand, in certain practical scenarios, formulations other than the classical POMDP make a lot of sense and can provide flexibility in balancing efficiency and correctness. This article considers a modified POMDP formulation that includes a Boolean objective, namely safe reachability. This article uses the notion of a partial conditional plan. Rather than explicitly enumerating all possible observations to construct a full conditional plan, this work samples a subset of all observations to ensure bounded replanning probability. Our theoretical and empirical results show that the failure rate of the constructed partial conditional plan is bounded by the replanning probability. Moreover, these partial conditional plans can be cached to further improve the performance. Our results suggest that for domains where replanning is easy, increasing the replanning probability bound usually leads to better scalability, and for domains where replanning is difficult or impossible in some states, we can decrease the bound and allocate more computation time to achieve a higher success rate. Hence, in certain cases, the practitioner can take advantage of their knowledge of the problem domain to scale to larger problems. Preliminary physical experiments suggest that this approach is applicable to real-world robotic domains, but it requires a discrete representation of the workspace. How to deal with continuous workspace directly is an interesting future direction.

  • Conference Article
  • 10.1109/dasc.2013.96
Moving Object's Detect in a Monocular Moving Camera
  • Dec 1, 2013
  • Ye-Gang Chen + 1 more

Recently, there has been increasing interest in using mobile robots within spaces where humans reside, and safe navigation by effective sensing becomes an important issue. In this paper, we describe a method to detect moving objects that exist in front of an autonomously navigating robot by analyzing images of a monocular color camera mounted on the robot. In particular, detecting humans who walk in a direction opposite to the robot's motion is studied because this increases the danger of collision between the pedestrians and the robot. Although moving object detection and obstacle avoidance have been actively studied in the fields of computer vision and intelligent robotics, respectively, analyzing the images of a moving camera is still challenging. One method presented in this paper is based on comparing the current image and a past image in order to find moving objects in the scene. Assuming that the speed of the robot is known, a correspondence between a certain part of the current image and a matching part of a past image is established. Approaching objects can then be detected for a case in which the degree of mismatch between the two corresponding image parts is high. Another method we employ is the detection of human faces. Since a human face has unique features in color and shape, we can search faces in images in order to detect approaching humans. We propose a fast and simple masking method for face detection in a small search region specified from appearance-based foot detection. These two proposed methods were combined to effectively find approaching humans in our experiments. We could get promising test results.

  • PDF Download Icon
  • Research Article
  • Cite Count Icon 2
  • 10.4236/ijis.2012.21001
Evaluating Effects of Two Alternative Filters for the Incremental Pruning Algorithm on Quality of Pomdp Exact Solutions
  • Jan 1, 2012
  • International Journal of Intelligence Science
  • Mahdi Naser-Moghadasi

Decision making is one of the central problems in artificial intelligence and specifically in robotics. In most cases this problem comes with uncertainty both in data received by the decision maker/agent and in the actions performed in the environment. One effective method to solve this problem is to model the environment and the agent as a Partially Observable Markov Decision Process (POMDP). A POMDP has a wide range of applications such as: Machine Vision, Marketing, Network troubleshooting, Medical diagnosis etc. In recent years, there has been a significant interest in developing techniques for finding policies for (POMDPs).We consider two new techniques, called Recursive Point Filter (RPF) and Scan Line Filter (SCF) based on Incremental Pruning (IP) POMDP solver to introduce an alternative method to Linear Programming (LP) filter for IP. Both, RPF and SCF have solutions for several POMDP problems that LP could not converge to in 24 hours. Experiments are run on problems from POMDP literature, and an Average Discounted Reward (ADR) is computed by testing the policy in a simulated environment.

Save Icon
Up Arrow
Open/Close
  • Ask R Discovery Star icon
  • Chat PDF Star icon

AI summaries and top papers from 250M+ research sources.