Abstract

This article presents a novel scheme, namely, an intermittent learning scheme based on Skinner's operant conditioning techniques that approximates the optimal policy while decreasing the usage of the communication buses transferring information. While traditional reinforcement learning schemes continuously evaluate and subsequently improve, every action taken by a specific learning agent based on received reinforcement signals, this form of continuous transmission of reinforcement signals and policy improvement signals can cause overutilization of the system's inherently limited resources. Moreover, the highly complex nature of the operating environment for cyber-physical systems (CPSs) creates a gap for malicious individuals to corrupt the signal transmissions between various components. The proposed schemes will increase uncertainty in the learning rate and the extinction rate of the acquired behavior of the learning agents. In this article, we investigate the use of fixed/variable interval and fixed/variable ratio schedules in CPSs along with their rate of success and loss in their optimal behavior incurred during intermittent learning. Simulation results show the efficacy of the proposed approach.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.