• Bi-stream convolutional neural network is developed for human action recognition. • Object information and transition class significantly improve recognition accuracy. • Variable-length Markov model captures causal dependency embedded in action sequences. • High action prediction accuracy is achieved with human-like prediction logic. As one of the critical elements for smart manufacturing, human-robot collaboration (HRC), which refers to goal-oriented joint activities of humans and collaborative robots in a shared workspace, has gained increasing attention in recent years. HRC is envisioned to break the traditional barrier that separates human workers from robots and greatly improve operational flexibility and productivity. To realize HRC, a robot needs to recognize and predict human actions in order to provide assistance in a safe and collaborative manner. This paper presents a hybrid approach to context-aware human action recognition and prediction, based on the integration of a convolutional neural network (CNN) and variable-length Markov modeling (VMM). Specifically, a bi-stream CNN structure parses human and object information embedded in video images as the spatial context for action recognition and collaboration context identification. The dependencies embedded in the action sequences are subsequently analyzed by a VMM, which adaptively determines the optimal number of current and past actions that need to be considered in order to maximize the probability of accurate future action prediction. The effectiveness of the developed method is evaluated experimentally on a testbed which simulates an assembly environment. High accuracy in both action recognition and prediction is demonstrated.
Read full abstract