Computer vision using deep learning algorithms has served numerous human activity identification applications, particularly those linked to safety and security. However, even though autistic children are frequently exposed to danger as a result of their activities, many computer vision experts have shown little interest in their safety. Several autistic children show severe challenging behaviors such as the Meltdown Crisis which is characterized by hostile behaviors and loss of control. This study aims to introduce a monitoring system capable of predicting the Meltdown Crisis condition early and alerting the children’s parents or caregivers before entering more difficult settings. For this endeavor, the suggested system was constructed using a combination of a pre-trained Vision Transformer (ViT) model (Swin-3D-b) and a Residual Network (ResNet) architecture to extract robust features from video sequences to extract and learn the spatial and temporal features of the Stereotyped Motor Movements (SMMs) made by autistic children at the beginning of the Meltdown Crisis state, which is referred to as the Pre-Meltdown Crisis state. The evaluation was conducted using the MeltdownCrisis dataset, which contains realistic scenarios of autistic children’s behaviors in the Pre-Meltdown Crisis state, with data from the Normal state serving as the negative class. Our proposed model achieved great classification accuracy, at 92%.
Read full abstract