Abstract

Humans can easily understand whether a person’s action is intentional or not. However, it is very challenging to teach a machine to recognize this due to the lack of referable comparisons and reliable annotations. Given a video with unintentional action, the annotations are usually unreliable due to the intrinsic ambiguity from multiple annotators and the subjective appraisals. To address this problem, we propose a new framework which online aggregates multiple probabilistic labels for unintentional action localization. Specifically, we first model the uncertainty of annotations with a temporal probability distribution, and then develop a label attention model to aggregate the reliable annotations in an online manner. We evaluate our method on the public OOPS dataset where each video contains multiple annotations of unintentional action and our experimental results show that mining reliable supervision information from multiple unreliable annotations achieves significant improvements over the baseline methods.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.