Abstract

Since the amount of videos on the internet is huge and continuously increases, it is impossible to pre-index events in these videos. Thus, we extract the definition of each event from example videos provided as a query. But, different from positive examples, it is impractical to manually provide a variety of negative examples. Hence, we use partially supervised learning'' where the definition of the event is extracted from positive and unlabeled examples. Specifically, negative examples are firstly selected based on similarities between positive and unlabeled examples. Here, to appropriately calculate similarities, we use a ‘‘video mask'' which represent relevant features based on a typical layout of objects in the event. Then, we extract the event definition from positive and negative examples. In this process, we consider that shots of the event contain significantly different features due to various camera techniques and object movements. In order to cover such a large variation of features, we use rough set theory'' to extract multiple definitions of the event. Experimental results on TRECVID 2008 video collection validate the effectiveness of our method.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call