Abstract

Elderly monitoring systems are gaining attention in the modern aging society. For the purpose, Far-InfraRed (FIR) sensors are often used, because they can avoid privacy concerns and are robust to environmental lightings. The authors have previously proposed several methods for human skeleton estimation from an extremely low-resolution FIR image sequence whose resolution is 16 × 16 pixels. For more accurate estimation, this paper proposes a method that is robust to variations of human positions and actions in the FIR sequences. Specifically, to extract features robust to the human positions from the images by using a Convolutional Neural Network (CNN), a global max-pooling layer is inserted into the last layer instead of multiple pooling layers which are not suitable for low-resolution inputs. Also, a network with two branches is introduced that focuses on capturing spatial and temporal information respectively. Moreover, the network has a weighted sum mechanism of their outputs, which depends on the human actions. For evaluation, a dataset was created by capturing action sequences of a human at various positions in the FIR images. Through an experiment, we confirmed that the human motion can be smoothly estimated and that the estimation accuracy is improved by the proposed method.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call