Restricted and Repetitive Behaviors (RRBs) are hallmark features of children with autism spectrum disorder (ASD) and are also one of the diagnostic criteria for the condition. Traditional methods of RRBs assessment through manual observation are limited by low diagnostic efficiency and uncertainty in outcomes. As a result, AI-assisted screening for autism has emerged as a promising research direction. In this study, we explore the synergy of visual foundation models and multimodal large language models (MLLMs), proposing a Multi-Model Synergistic Restricted and Repetitive Behavior Recognition method (MS-RRBR). Based on this method, we developed an interpretable multi-model autonomous question-answering system. To evaluate the effectiveness of our approach, we collected and annotated the Autism Restricted and Repetitive Behavior Dataset (ARRBD), which includes 10 ASD-related behaviors easily observable from various visual perspectives. Experimental results on the ARRBD dataset demonstrate that our multi-model collaboration outperforms single-model approaches, achieving the highest recognition accuracy of 94.94%. The MS-RRBR leverages the extensive linguistic knowledge of GPT-4o to enhance the zero-shot visual recognition capabilities of the MLLM, while also providing clear explanations for system decisions. This approach holds promise for providing timely, reliable, and accurate technical support for clinical diagnosis and educational rehabilitation in ASD.
Read full abstract