Convolutional neural networks (CNNs) for extracting structural information from structural magnetic resonance imaging (sMRI), combined with functional magnetic resonance imaging (fMRI) and neuropsychological features, has emerged as a pivotal tool for early diagnosis of Alzheimer’s disease (AD). However, the fixed-size convolutional kernels in CNNs have limitations in capturing global features, reducing the effectiveness of AD diagnosis. We introduced a group self-calibrated coordinate attention network (GSCANet) designed for the precise diagnosis of AD using multimodal data, including encompassing Haralick texture features, functional connectivity, and neuropsychological scores. GSCANet utilizes a parallel group self-calibrated module to enhance original spatial features, expanding the field of view and embedding spatial data into channel information through a coordinate attention module, which ensures long-term contextual interaction. In a four-classification comparison (AD vs. early MCI (EMCI) vs. late MCI (LMCI) vs. normal control (NC)), GSCANet demonstrated an accuracy of 78.70%. For the three-classification comparison (AD vs. MCI vs. NC), it achieved an accuracy of 83.33%. Moreover, our method exhibited impressive accuracies in the AD vs. NC (92.81%) and EMCI vs. LMCI (84.67%) classifications. GSCANet improves classification performance at different stages of AD by employing group self-calibrated to expand features receptive field and integrating coordinated attention to facilitate significant interactions among channels and spaces. Providing insights into AD mechanisms and showcasing scalability for various disease predictions.