Abstract

In order to perform multimedia event detection (MED) tasks in uncontrolled videos, a very large number of labeled videos are required for training the event classifier, which would become quite challenging especially when there are lots of events. Because an event involves usually several spatial temporal objects, one intuitive solution is to model those objects from a large number of labeled images which can be obtained very easily from standard image datasets, such as the ImageNet challenge dataset, and to model their spatial temporal relationships from a relatively small number of labeled videos which can be also obtained very easily from standard video datasets, such as the TRECVID MED 2012 dataset. In this paper, we propose accordingly a latent group logistic regression (latent GLR) mixture model for those objects and an event bank descriptor for their spatial temporal relationships. Furthermore, we develop an efficient iterative training algorithm to learn model parameters of the individual latent GLR mixture model, which combines the coordinate descent approach and the gradient descent approach to minimize the l2,1-norm or group regularized logistic loss function. We also conduct extensive experiments to evaluate the object detection performance by using the latent GLR mixture model on the ImageNet challenge dataset and the event detection performance by using the event bank descriptor on the TRECVID MED 2012 dataset. The results demonstrate the effectiveness of both proposed approaches.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call