Abstract

In this paper, the proposed work tests the computer vision application to perform the skill and emotion assessment of children with Autism Spectrum Disorder (ASD) by extracting various bio-behaviors, human activities, child-therapist interactions, and joint pose estimations from the video-recorded interactive single-or two-person play-based intervention sessions. A comprehensive data set of 300 videos are amassed from ASD children engaged in social interaction and developed three novel deep learning-based computer vision models which are explained as follows: 1) activity comprehension to analyze child-play partner interactions (Activity Comprehension model); 2) an automatic joint attention recognition framework using pose, and 3) emotion and facial expression recognition. We tested models on children’s real-world unseen 68 videos captured from the clinic and public datasets. The activity comprehension model has an overall accuracy of 72.32%, the joint attention models have an accuracy of 97% for following eye gaze and 93.4% for hand pointing and the facial expression recognition model has an overall accuracy of 95.1%. The proposed models could extract activities and behaviors of interest from free-play and intervention session videos, empowering clinicians with data useful in diagnosis, assessment, treatment formulation, and monitoring of ASD children with limited supervision.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call