Abstract

Chapter 12 presented three structural results for stopping time POMDPs: convexity of the stopping region (for linear costs), the existence of a threshold switching curve for the optimal policy (under suitable conditions) and characterization of the optimal linear threshold policy. This chapter discusses several examples of stopping time POMDPs in quickest change detection. We will show that for these examples, convexity of the stopping set and threshold optimal policies arise naturally. Therefore, the structural results of Chapter 12 serve as a unifying theme and give substantial insight into what might otherwise be considered as a collection of sequential detection methods. This chapter considers the following extensions of quickest change detection: • Example 1: Quickest change detection with phase-distributed change time: classical quickest detection is equivalent to a stopping time POMDP where the underlying Markov chain jumps only once into an absorbing state (therefore the jump time is geometric distributed). How should quickest change detection be performed when the change time is phase-distributed and the stopping cost is quadratic in the belief state to penalize the variance in the state estimate? • Example 2: Quickest transient detection: if the state of nature jumps into a state and then jumps out of the state, how should quickest detection of this transient detection be performed? The problem is equivalent to a stopping time POMDP where the Markov chain jumps only twice. • Example 3: Risk-sensitive quickest detection: how to perform quickest detection with an exponential penalty. • Example 4: Quickest detection with social learning: if individual agents learn an underlying state by performing social learning, how can the quickest change detection be applied by a global decision-maker? As will be shown, this interaction of local and global decision-makers results in interesting non-monotone behavior and the stopping set is not necessarily convex. • Example 5: Quickest time herding with social learning: how should a decision-maker estimate an underlying state of nature when agents herd while performing social learning? • Example 6: How should a monopoly optimally price a product when customers perform social learning? Each time a customer buys the product, the monopoly makes money and also gets publicity due to social learning. It is shown that it is optimal to start at a high price and then decrease the price over time.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.