Abstract

Recently, the prediction of interactions in videos has been an active subject in computer vision. Its goal is to deduce interactions in their early stages. Many approaches have been proposed to predict interaction, but it still remains a challenging problem. In the present paper, features are optical flow fields extracted from video frames using convolutional neural networks. This feature, which is extracted from successive frames, constructs a time series. Then, the problem is modeled in the form of a time series prediction. Prediction of the interaction type is based on matching the time series under experiment with the time series available in the training set. Dynamic time warping provides an optimal match between a pair of time-series data by a nonlinear mapping between two data. Finally, the SVM and KNN classification methods with dynamic time warping distance are used to predict the video label. The results showed that the proposed model improved on standard interaction recognition datasets including the TVHI, BIT, and UT interaction.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.