Automatic annotation of tennis action for content‐based retrieval by collaborating audio and visual information

Hisashi Miyamori

doi:10.1002/ecjc.20120

Abstract

AbstractIn this paper, the author proposes a scheme for automatically annotating tennis action for content‐based retrieval by using video and audio information collaboratively. Conventionally, annotation for content‐based retrieval had been associated with the action events to be recognized by analyzing information such as the locus or relative position and their transitions for each target object within the video. However, these methods have the drawback that when only video information is used, there are parts for which important times or positions basically cannot be identified because of tracking errors due to occlusion of the target object, which results in action events not being detected or detection errors occurring. The proposed method, which is applied to tennis video, uses audio information to extract the times when a player strikes the ball and recognizes basic actions of the player such as a forehand swing or overhead swing from the positional relationship between the player and ball at those times. The authors performed experiments to check the degree to which the recognition of the basic action of a player was affected by whether or not audio information was used to extract the ball impact time. The results verified the effectiveness of the approach by showing that the proposed scheme could prevent several action identification errors that could not be avoided by using only video information. © 2004 Wiley Periodicals, Inc. Electron Comm Jpn Pt 3, 87(11): 57–72, 2004; Published online in Wiley InterScience (www.interscience.wiley.com). DOI 10.1002/ecjc.20120

Full Text