In this paper we raise two important question, “1. Is temporal information beneficial in recognizing actions from still images? 2. Do we know how to take the maximum advantage from them?”. To answer these question we propose a novel transfer learning problem, Temporal To Still Image Learning (i.e., T2SIL) where we learn to derive temporal information from still images. Thereafter, we use a two-stream model where still image action predictions are fused with derived temporal predictions. In T2SIL, the knowledge transferring occurs from temporal representations of videos (e.g., Optical-flow, Dynamic Image representations) to still action images. Along with the T2SIL we propose a new action still image action dataset and a video dataset sharing the same set of classes. We explore three well established transfer learning frameworks (i.e., GANs, Embedding learning and Teacher Student Networks (TSNs)) in place of the temporal knowledge transfer method. The use of derived temporal information from our TSN and Embedding learning improves still image action recognition.
Read full abstract