Abstract

Human action/interaction recognition has wide appliances in video surveillance. Spatial and temporal information representation is the key issue of this topic. A framework called Long-term Residual Recurrent Network (LRRN) for human interaction recognition is proposed in this paper. The framework has an advantage of incorporating spatial and temporal features. Spatial feature is generated from Residual Network (ResNet). Temporal feature is learned from Long Short Term Memory (LSTM). The spatial-temporal feature representation learned automatically from LRRN is more expressive than hand-crafted counterparts. Optical flow image sequences are utilised to reduce static background interference. Experiments are conducted on BIT-interaction and UT-interaction datasets. The results show excellent performance in accuracy compared with prior traditional methods, achieving a state-of-art accuracy of 90% and 98.33% respectively.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call