Abstract

This paper proposes a general framework for detecting unsafe states of a system whose basic real-time parameters are captured by multiple sensors. Our approach is to learn a danger-level function that can be used to alert the users of dangerous situations in advance so that certain measures can be taken to avoid the collapse. The main challenge to this learning problem is the labeling issue, i.e., it is difficult to assign an objective danger level at each time step to the training data, except at the collapse points, where a definitive penalty can be assigned, and at the successful ends, where a certain reward can be assigned. In this paper, we treat the danger level as an expected future reward (a penalty is regarded as a negative reward) and use temporal difference (TD) learning to learn a function for approximating the expected future reward, given the current and historical sensor readings. The TD learning obtains the approximation by propagating the penalties/rewards observable at collapse points or successful ends to the entire feature space following some constraints. This avoids the labeling issue and naturally allows a general framework to detect unsafe states. Our approach is applied to, but not limited to, the application of monitoring driving safety, and the experimental results demonstrate the effectiveness of the approach.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.