To assess the accuracy of deep learning models for the diagnosis of maxillary fungal ball rhinosinusitis (MFB) and to compare the accuracy, sensitivity, specificity, precision, and F1-score with a rhinologist. Data from 1539 adult chronic rhinosinusitis (CRS) patients who underwent paranasal sinus computed tomography (CT) were collected. The overall dataset consisted of 254 MFB cases and 1285 non-MFB cases. The CT images were constructed and labeled to form the deep learning models. Seventy percent of the images were used for training the deep-learning models, and 30% were used for testing. Whole image analysis and instance segmentation analysis were performed using three different architectures: MobileNetv3, ResNet50, and ResNet101 for whole image analysis, and YOLOv5X-SEG, YOLOv8X-SEG, and YOLOv9-C-SEG for instance segmentation analysis. The ROC curve was assessed. Accuracy, sensitivity (recall), specificity, precision, and F1-score were compared between the models and a rhinologist. Kappa agreement was evaluated. Whole image analysis showed lower precision, recall, and F1-score compared to instance segmentation. The models exhibited an area under the ROC curve of 0.86 for whole image analysis and 0.88 for instance segmentation. In the testing dataset for whole images, the MobileNet V3 model showed 81.00% accuracy, 47.40% sensitivity, 87.90% specificity, 66.80% precision, and a 67.20% F1 score. Instance segmentation yielded the best evaluation with YOLOv8X-SEG showing 94.10% accuracy, 85.90% sensitivity, 95.80% specificity, 88.90% precision, and an 89.80% F1-score. The rhinologist achieved 93.5% accuracy, 84.6% sensitivity, 95.3% specificity, 78.6% precision, and an 81.5% F1-score. Utilizing paranasal sinus CT imaging with enhanced localization and constructive instance segmentation in deep learning models can be the practical promising deep learning system in assisting physicians for diagnosing maxillary fungal ball.