Deep learning algorithms gained attention for detection (CADe) of biliary tract cancer (BTC) in digital single-operator cholangioscopy (dSOC). We developed a multimodal convolutional neural network (CNN) for detection (CADe) characterization and discriminating (CADx) between malignant, inflammatory and normal biliary tissue in raw dSOC videos. In addition, clinical metadata was included in the CNN algorithm to overcome limitations of image-only models. Based on dSOC videos and images of 111 patients (total of 15,158 still frames), we developed and validated a real-time CNN-based algorithm for CADe and CADx. We established an image-only model and metadata injection approach. In addition, we validated frame-wise and case-based predictions on complete dSOC video sequences. Model embeddings were visualized and class-activation maps highlighted relevant image regions. The concatenation-based CADx approach achieved a per-frame AUC of 0.871, sensitivity of 0.809 (95% CI: [0.784-0.832]), specificity of 0.773 [0.761-0.785], PPV of 0.450 [0.423-0.467], and NPV of 0.946 [0.940-0.954] with respect to malignancy on 5,715 test frames from complete videos of 20 patients. For case-based diagnosis using average prediction scores, six out of eight malignant cases and all twelve benign cases were identified correctly. Our algorithm distinguishes malignant and inflammatory bile duct lesions in dSOC videos, indicating the potential of CNN-based diagnostic support systems for both, CADe and CADx. The integration of non-image data can improve CNN based support systems, targeting current challenges in the assessment of biliary strictures.