The threat posed by Alzheimer's disease (AD) to human health has grown significantly. However, the precise diagnosis and classification of AD stages remain a challenge. Neuroimaging methods such as structural magnetic resonance imaging (sMRI) and fluorodeoxyglucose positron emission tomography (FDG-PET) have been used to diagnose and categorize AD. However, feature selection approaches that are frequently used to extract additional data from multimodal imaging are prone to errors. This paper suggests using a static pulse-coupled neural network and a Laplacian pyramid to combine sMRI and FDG-PET data. After that, the fused images are used to train the Mobile Vision Transformer (MViT), optimized with Pareto-Optimal Quantum Dynamic Optimization for Neural Architecture Search, while the fused images are augmented to avoid overfitting and then classify unfused MRI and FDG-PET images obtained from the AD Neuroimaging Initiative (ADNI) and Open Access Series of Imaging Studies (OASIS) datasets into various stages of AD. The architectural hyperparameters of MViT are optimized using Quantum Dynamic Optimization, which ensures a Pareto-optimal solution. The Peak Signal-to-Noise Ratio (PSNR), the Mean Squared Error (MSE), and the Structured Similarity Indexing Method (SSIM) are used to measure the quality of the fused image. We found that the fused image was consistent in all metrics, having 0.64 SIMM, 35.60 PSNR, and 0.21 MSE for the FDG-PET image. In the classification of AD vs. cognitive normal (CN), AD vs. mild cognitive impairment (MCI), and CN vs. MCI, the precision of the proposed method is 94.73%, 92.98% and 89.36%, respectively. The sensitivity is 90. 70%, 90. 70%, and 90. 91% while the specificity is 100%, 100%, and 85. 71%, respectively, in the ADNI MRI test data.