AbstractBackgroundAlzheimer’s disease (AD) is a neurodegenerative disorder which causes gradual and irreversible damage to the brain. These patterns of brain damage can be detected using T1‐weighted MRI scans by modern neuroimaging methods. Recent developments in convolutional neural networks (CNN) have achieved promising results in various computer‐vision tasks. Feature attribution methods such as layer‐wise relevance propagation (LRP) allow tracing back the information flow in CNNs to derive relevance heatmaps, which approximate the contribution of the input image regions on the model decision. We addressed the open question, which of the most common CNN architectures is best suited for medical image detection, i.e. AD classification based on MRI data.MethodFour CNN architectures are widely used in the literature: AlexNet, VGG, ResNet, and DenseNet. We adapted these CNN architectures to be used with 3D brain MRI data and trained the models on a heterogeneous dataset with N>2200 from four large studies. We applied tenfold cross‐validation and additionally evaluated the results on an independent test dataset.ResultThe more complex CNN architectures DenseNet and ResNet with ‘skip connections’, provided the best results for both group separation tasks of ‐ AD vs Cognitively Normal (CN), and Mild Cognitive Impairment (MCI) vs CN, although the overall differences in accuracy did not reach statistical significance. From the mean relevance maps obtained using the LRP relevance propagation method (see Fig. 1.), we found that DenseNet focused primarily on the medial temporal lobe and posterior cingulate cortex atrophy. ResNet’s mean relevance maps were more noisy and heterogeneous.ConclusionDenseNet provided the most focused relevance maps, best matching apriori expectations of brain regions contributing to the detection of AD. Its ‘dense connections’ between the layers enabled a highly efficient information flow at various scales. Our evaluation setup also demonstrated the value of a holistic evaluation of models, where relevance maps are used in combination with classical performance metrics.