The detection and classification of cystic lesions of the jaw is of high clinical relevance and represents a topic of interest in medical artificial intelligence research. The human clinical diagnostic reasoning process uses contextual information, including the spatial relation of the detected lesion to other anatomical structures, to establish a preliminary classification. Here, we aimed to emulate clinical diagnostic reasoning step by step by using a combined object detection and image segmentation approach on panoramic radiographs (OPGs). We used a multicenter training dataset of 855 OPGs (all positives) and an evaluation set of 384 OPGs (240 negatives). We further compared our models to an international human control group of ten dental professionals from seven countries. The object detection model achieved an average precision of 0.42 (intersection over union (IoU): 0.50, maximal detections: 100) and an average recall of 0.394 (IoU: 0.50–0.95, maximal detections: 100). The classification model achieved a sensitivity of 0.84 for odontogenic cysts and 0.56 for non-odontogenic cysts as well as a specificity of 0.59 for odontogenic cysts and 0.84 for non-odontogenic cysts (IoU: 0.30). The human control group achieved a sensitivity of 0.70 for odontogenic cysts, 0.44 for non-odontogenic cysts, and 0.56 for OPGs without cysts as well as a specificity of 0.62 for odontogenic cysts, 0.95 for non-odontogenic cysts, and 0.76 for OPGs without cysts. Taken together, our results show that a combined object detection and image segmentation approach is feasible in emulating the human clinical diagnostic reasoning process in classifying cystic lesions of the jaw.