Abstract

Numerous real-world decision or control problems involve multiple conflicting objectives whose relative importance (preference) is required to be weighed in different scenarios. While Pareto optimality is desired, environmental uncertainties (e.g., environmental changes or observational noises) may mislead the agent into performing suboptimal policies. In this article, we present a novel multiobjective optimization paradigm, robust multiobjective reinforcement learning (RMORL) considering environmental uncertainties, to train a single model that can approximate robust Pareto-optimal policies across the entire preference space. To enhance policy robustness against environmental changes, an environmental disturbance is modeled as an adversarial agent across the entire preference space via incorporating a zero-sum game into a multiobjective Markov decision process (MOMDP). Additionally, we devise an adversarial defense technique against observational perturbations, which ensures that policy variations, perturbed by adversarial attacks on state observations, remain within bounds under any specified preferences. The proposed technique is assessed in five multiobjective environments with continuous action spaces, showcasing its effectiveness through comparisons with competitive baselines, which encompass classical and state-of-the-art schemes.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.