Abstract

Fine-grained activity recognition, especially cooking activity one, with egocentric videos is a hot topic and a challenging task in computer vision. To tackle this problem, many researchers have tried to leverage the information of cooking tools such as knife, peeler, etc., or that of equipment in the background. Although these are useful to improve the recognition performance on general cooking activity categories, the information does not provide sufficient evidences to recognize fine-grained cooking activities such as slicing, mincing, etc., because these belong same the general category and we often utilize the same type of tools. In addition, since the types of tools and equipment differs for each kitchen, a recognition model can over-fit to some specific environments in training data due to the over-confidence on such information.Therefore, a method having a high discriminating power of object classification and robustness for the environment difference is required. For the first step to realize such a method, in this research, we focus on the characteristics of egocentric video, i.e., capturing hands of camera wearer without occlusions. Hand shape is useful to recognize the objects manipulated by camera wearer and sequential hand positions are also effective to analyze hand movement. By using these advantages, in this paper, we proposed a new multi-stream CNN, which has a mask image branch to leverage the hand shape and position information, in addition to the RGB and optical flow branches. From the empirical experiments for fine-grained cooking activity recognition in three types of kitchens, our proposed method outperformed the conventional methods and we confirmed that our proposed method has higher robustness for environmental difference compared with conventional methods.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.