Abstract

Since a gesture involves a dynamic and complex motion, multiview observation and recognition are desirable. For the better representation of gestures, one needs to know, in the first place, from which views a gesture should be observed. Furthermore, it becomes increasingly important how the recognition results are integrated when larger numbers of camera views are considered. To investigate these problems, we propose a framework under which multiview recognition is carried out, and an integration scheme by which the recognition results are integrated online and in realtime. For performance evaluation, we use the ViHASi (Virtual Human Action Silhouette) public image database as a benchmark and our Japanese sign language (JSL) image database that contains 18 kinds of hand signs. By examining the recognition rates of each gesture for each view, we found gestures that exhibit view dependency and the gestures that do not. Also, we found that the view dependency itself could vary depending on the target gesture sets. By integrating the recognition results of different views, our swarm-based integration provides more robust and better recognition performance than individual fixed-view recognition agents.

Highlights

  • For the symbiosis of humans and machines, various kinds of sensing devices will be either implicitly or explicitly embedded, networked, and cooperatively function in our future living environment [1,2,3]

  • In [14], we have investigated the temporal-domain problems on gesture recognition and suggested that the recognition performance can depend on image sampling rate

  • There are some studies on view selection problems [15, 16], they do not deal with human gestures, and how the recognition results should be integrated when larger numbers of camera views are available is not studied

Read more

Summary

Introduction

For the symbiosis of humans and machines, various kinds of sensing devices will be either implicitly or explicitly embedded, networked, and cooperatively function in our future living environment [1,2,3]. Gesture recognizing systems that function in real world must operate in real-time, including the time needed for event detection, tracking, and recognition. Since the number of cameras can be very large, distributed processings of incoming images at each camera node are inevitable in order to satisfy the real-time requirement. Improvements in recognition performance can be expected by integrating responses from each distributed processing component. It is usually not evident how the responses should be integrated. Since a gesture is such a dynamic and complex motion, single-view observation does not necessary guarantee better recognition performance. One needs to know from which camera views a gesture should be observed in order to quantitatively determine the optimal camera configuration and views

Methods
Results
Discussion
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call