Timely and efficient mapping of date palm plantations through unmanned aerial vehicle (UAV) remote sensing is critical for continuous observation, health and risk evaluation, pest management, resource optimization, and ensuring the long-term sustainability of the dates industry. This study presents an efficient and cost-effective transformer-based approach to identify, countify, monitor, and evaluate the overall well-being of palm trees using extensive UAV imagery. The suggested approach integrates an improved multiscale vision transformer, feature pyramid network, Mask R–CNN, and improved slicing-aided hyper inference for practical large-scale assessments. This combination enabled the extraction of multiscale features, capturing long-range dependencies in the data and boosting the model's generalizability. The proposed architecture outperformed several CNN-based architectures (including Mask R–CNN, Cascade Mask R–CNN, Point-based Rendering, and You Only Look At CoefficientTs), achieving F-scores of 94.33% and 94.2% for date palm tree detection and segmentation, respectively. The transformer-based architecture was optimized using transfer learning to differentiate between healthy and unhealthy date palm trees, particularly those with severe infestations. The potential generic condition of date palm trees was predicted with an F-score of 88.4%. Further advancements in this field could pave the way for a proactive strategy, enabling timely detection, which would aid in pest management and support the sustainable growth of the dates sector.