Touch is one of the most essential and effective means to convey affective feelings and intentions in human communication. For a social robot, the ability to recognize human touch gestures and emotions could help realize efficient and natural human–robot interaction. To this end, an affective touch gesture dataset involving ten kinds of touch gestures and 12 kinds of discrete emotions was built by using a pressure sensor array, in which the acquired touch gesture samples are three-dimensional (3-D) spatiotemporal signals that include the shape appearance and motion dynamics. Due to the excellent performance of convolutional neural networks (CNNs), spatiotemporal CNNs have been effectively verified by researchers for 3-D signal classification. However, the large number of parameters and the high complexity of training 3-D convolution kernels remain to be solved. In this article, a decomposed spatiotemporal convolution was designed for feature representation from the raw touch gesture samples. Specifically, the 3-D kernel was factorized into three 1-D kernels by tensor decomposition. The proposed convolution has a simpler but deeper architecture than standard 3-D convolution, which improves the nonlinear expression ability of the model. Besides, the computation cost can be reduced without compromising recognition accuracy. Using a user-dependent test mode, the proposed method yields the accuracies of up to 92.41% and 72.47% for touch gesture and emotion recognitions, respectively. Experimental results demonstrate the effectiveness of the proposed method, and at the same time, preliminarily verify the feasibility of robot perceiving human emotions through touch.
Read full abstract