Abstract
Elderly monitoring systems are gaining attention in the modern aging society. For the purpose, Far-InfraRed (FIR) sensors are often used, because they can avoid privacy concerns and are robust to environmental lightings. The authors have previously proposed several methods for human skeleton estimation from an extremely low-resolution FIR image sequence whose resolution is 16 × 16 pixels. For more accurate estimation, this paper proposes a method that is robust to variations of human positions and actions in the FIR sequences. Specifically, to extract features robust to the human positions from the images by using a Convolutional Neural Network (CNN), a global max-pooling layer is inserted into the last layer instead of multiple pooling layers which are not suitable for low-resolution inputs. Also, a network with two branches is introduced that focuses on capturing spatial and temporal information respectively. Moreover, the network has a weighted sum mechanism of their outputs, which depends on the human actions. For evaluation, a dataset was created by capturing action sequences of a human at various positions in the FIR images. Through an experiment, we confirmed that the human motion can be smoothly estimated and that the estimation accuracy is improved by the proposed method.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
More From: Journal of the Japan Society for Precision Engineering
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.