Deep-learning methods based on deep neural networks (DNNs) have recently been successfully utilized in the analysis of neuroimaging data. A convolutional neural network (CNN) is a type of DNN that employs a convolution kernel that covers a local area of the input sample and moves across the sample to provide a feature map for the subsequent layers. In our study, we hypothesized that a 3D-CNN model with down-sampling operations such as pooling and/or stride would have the ability to extract robust feature maps from the shifted and scaled neuronal activations in a single functional MRI (fMRI) volume for the classification of task information associated with that volume. Thus, the 3D-CNN model would be able to ameliorate the potential misalignment of neuronal activations and over-/under-activation in local brain regions caused by imperfections in spatial alignment algorithms, confounded by variability in blood-oxygenation-level-dependent (BOLD) responses across sessions and/or subjects. To this end, the fMRI volumes acquired from four sensorimotor tasks (left-hand clenching, right-hand clenching, auditory attention, and visual stimulation) were used as input for our 3D-CNN model to classify task information using a single fMRI volume. The classification performance of the 3D-CNN was systematically evaluated using fMRI volumes obtained from various minimal preprocessing scenarios applied to raw fMRI volumes that excluded spatial normalization to a template and those obtained from full preprocessing that included spatial normalization. Alternative classifier models such as the 1D fully connected DNN (1D-fcDNN) and support vector machine (SVM) were also used for comparison. The classification performance was also assessed for several k-fold cross-validation (CV) schemes, including leave-one-subject-out CV (LOOCV). Overall, the classification results of the 3D-CNN model were superior to that of the 1D-fcDNN and SVM models. When using the fully-processed fMRI volumes with LOOCV, the mean error rates (± the standard error of the mean) for the 3D-CNN, 1D-fcDNN, and SVM models were 2.1% (± 0.9), 3.1% (± 1.2), and 4.1% (± 1.5), respectively (p = 0.041 from a one-way ANOVA). The error rates for 3-fold CV were higher (2.4% ± 1.0, 4.2% ± 1.3, and 10.1% ± 2.0; p < 0.0003 from a one-way ANOVA). The mean error rates also increased considerably using the raw fMRI 3D volume data without preprocessing (26.2% for the 3D-CNN, 75.0% for the 1D-fcDNN, and 75.0% for the SVM). Furthermore, the ability of the pre-trained 3D-CNN model to handle shifted and scaled neuronal activations was demonstrated in an online scenario for five-class classification (i.e., four sensorimotor tasks and the resting state) using the real-time fMRI of three participants. The resulting classification accuracy was 78.5% (± 1.4), 26.7% (± 5.9), and 21.5% (± 3.1) for the 3D-CNN, 1D-fcDNN, and SVM models, respectively. The superior performance of the 3D-CNN compared to the 1D-fcDNN was verified by analyzing the resulting feature maps and convolution filters that handled the shifted and scaled neuronal activations and by utilizing an independent public dataset from the Human Connectome Project.
Read full abstract