Ameloblastoma (AM), periapical cyst (PC), and chronic suppurative osteomyelitis (CSO) are prevalent maxillofacial diseases with similar imaging characteristics but different treatments, thus making preoperative differential diagnosis crucial. Existing deep learning methods for diagnosis often require manual delineation in tagging the regions of interest (ROIs), which triggers some challenges in practical application. We propose a new model of Wavelet Extraction and Fusion Module with Vision Transformer (WaveletFusion-ViT) for automatic diagnosis using CBCT panoramic images. In this study, 539 samples containing healthy (n = 154), AM (n = 181), PC (n = 102), and CSO (n = 102) were acquired by CBCT for classification, with an additional 2000 healthy samples for pre-training the domain-adaptive network (DAN). The WaveletFusion-ViT model was initialized with pre-trained weights obtained from the DAN and further trained using semi-supervised learning (SSL) methods. After five-fold cross-validation, the model achieved average sensitivity, specificity, accuracy, and AUC scores of 79.60%, 94.48%, 91.47%, and 0.942, respectively. Remarkably, our method achieved 91.47% accuracy using less than 20% labeled samples, surpassing the fully supervised approach's accuracy of 89.05%. Despite these promising results, this study's limitations include a low number of CSO cases and a relatively lower accuracy for this condition, which should be addressed in future research. This research is regarded as an innovative approach as it deviates from the fully supervised learning paradigm typically employed in previous studies. The WaveletFusion-ViT model effectively combines SSL methods to effectively diagnose three types of CBCT panoramic images using only a small portion of labeled data.