Abstract

Recognizing time-varying object states in complex tasks is an important and challenging issue. In this paper, we propose a novel model to jointly infer object fluents and complex tasks in videos. A task is a complex human activity with specific goals and a fluent is defined as a time-varying object state. A hierarchical graph represents a task as a human action stream and multiple concurrent object fluents which vary as the human performs the actions. In this process, the human actions serve as the causes of object state changes which conversely reflect the effects of human actions. For a given input video, a causal sampling search algorithm is proposed to jointly infer the task category and the states of objects in each video frame. For model learning, a structural SVM framework is adopted to jointly train the task, fluent, cause, and effect parameters. We test the proposed method on a task and fluent dataset. Experimental results demonstrate the effectiveness of the proposed method.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.