Abstract

Fall is the biggest threat to seniors, with significant emotional, physical and financial implications. It is the major cause of serious injuries, disabilities, hospitalizations and even death especially for elderly people living alone. Timely detection could provide immediate medical service to the injured and avoid its harmful consequences. Great number of vision-based techniques has been proposed by installing cameras in several everyday environments. Recently, deep learning has revolutionized these techniques, mostly using convolutional neural networks (CNNs). In this paper, we propose weighted multi-stream deep convolutional neural networks that exploit the rich multimodal data provided by RGB-D cameras. Our method detects automatically fall events and sends a help request to the caregivers. Our contribution is three-fold. We build a new architecture composed of four separate CNN streams, one for each modality. The first modality is based on a single combined RGB and depth image to encode static appearance information. RGB image is used to capture color and texture and depth image deals with illumination variations. In contrast of the first feature that lacks the contextual information about previous and next frames, the second modality characterizes the human shape variations. After background subtraction and person recognition, human silhouette is extracted and stacked to define history of binary motion HBMI. The last two modalities are used to more discriminate the motion information. Stacked amplitude and oriented flow are used in addition to stacked optical flow field to describe respectively the velocity, the direction and the motion displacements. The main motivation behind the use of these multimodal data is to combine complementary information such as motion, shape, RGB and depth appearance to achieve more accurate detection than using only one modality. Our second contribution is the combination of the four streams to generate the final decision for fall detection. We evaluate early and late fusion strategies and we have defined the weight of each modality based on its overall system performance. Weighted score fusion is finally adopted based on our experiments. In the third contribution, transfer learning and data augmentation are applied to increase the amount of training data, avoid over fitting and improve the accuracy. Experiments have been conducted on publicly available standard datasets and demonstrate the effectiveness of the proposed method compared to existing methods.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call