Abstract

As a local invariant feature of videos, the spatiotemporal interest point (STIP) has been widely used in computer vision and pattern recognition. However, existing STIP detectors are generally extended from detection algorithms constructed for local invariant features of two-dimensional images, which does not explicitly exploit the motion information inherent in the temporal domain of videos, thus weakening the performance of existing STIP detectors in a video context. To remedy this, we aim to develop an STIP detector that uniformly captures appearance and motion information for video, thus yielding substantial performance improvement. Specifically, under the framework of geometric algebra, we first develop a spatiotemporal unified model of appearance and motion-variation information (UMAMV), and then a UMAMV-based scale space of the spatiotemporal domain is proposed to synthetically analyze appearance information and motion information in a video. Based on this model, we propose an STIP feature of UMAMV-SIFT that embraces both appearance and motion variation information of the videos. Three datasets with different sizes are utilized to evaluate the proposed model and the STIP detector. We present experimental results to show that the UMAMV-SIFT achieves state-of-the-art performance and is particularly effective when dataset is small.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.