A lightweight CNN and Transformer hybrid model for mental retardation screening among children from spontaneous speech

Wei Meng,Qianhong Zhang,Simeng Ma,Mincheng Cai,Dujuan Liu,Zhongchun Liu,Jun Yang

doi:10.1016/j.compbiomed.2022.106281

Abstract

Mental retardation (MR) is a group of mental disorders characterized by low intelligence and social adjustment difficulties. Early diagnosis is beneficial for the timely intervention of children with MR to ease the degree of disability. Children with MR always have impaired speech functions compared to normal children, which is significant for clinical diagnosis. On the basis of this, our study proposes a spontaneous speech-based framework (MT-Net) for screening MR, which merges mobile inverted bottleneck convolutional blocks (MBConv) and visual Transformer blocks. MT-Net takes log-mel spectrograms converted from raw interview speech as data source, and utilizes MBConv and visual Transformer to learn low-level and high-level features well. In addition, SpecAugment, a data augmentation strategy, has been used to expand our audio dataset to further enhance the performance of MT-Net. The experimental results show that our proposed MT-Net outperforms Transformer networks (ViT) and convolutional neural networks (ResNet18, MobileNetV2, EfficientNetV2), achieving accuracy of 91.60% after using SpecAugment. Our proposed MT-Net has fewer parameters, low computing consumption and high prediction accuracy, which is expected to be an auxiliary screening tool for MR.

Full Text