Accurate diagnosis of plant diseases is crucial for crop health. This study introduces the EDA–ViT model, a Vision Transformer (ViT)-based approach that integrates adaptive entropy-based data augmentation for diagnosing custard apple (Annona squamosa) diseases. Traditional models like convolutional neural network and ViT face challenges with local feature extraction and large dataset requirements. EDA–ViT overcomes these by using a multi-scale weighted feature aggregation and a feature interaction module, enhancing both local and global feature extraction. The adaptive data augmentation method refines the training process, boosting accuracy and robustness. With a dataset of 8226 images, EDA–ViT achieved a classification accuracy of 96.58%, an F1 score of 96.10%, and a Matthews Correlation Coefficient (MCC) of 92.24%, outperforming other models. The inclusion of the Deformable Multi-head Self-Attention (DMSA) mechanism further enhanced feature capture. Ablation studies revealed that the adaptive augmentation contributed to a 0.56% accuracy improvement and a 0.34% increase in MCC. In summary, EDA–ViT presents an innovative solution for custard apple disease diagnosis, with potential applications in broader agricultural disease detection, ultimately aiding precision agriculture and crop health management.