Abstract Background Advances in Machine learning (ML) for biomedical research have led to ground-breaking results that are evident in several healthcare settings. In the field of NLP, there are applications for text mining, named entity recognition, classification of pathology and radiology reports, and text generation. Despite these advancements, the main barrier towards the use of AI systems in a clinical setting is their lack of explainability and interpretability. Objective From the various avenues available through which to create transparent and explainable ML models, we investigated how a stable, accurate, and trusted biomedical standard, the Unified Medical Language System (UMLS) can be applied to retrospectively justify and explain the results of ML models. Methods We developed a novel architecture that places a UMLS-based system after the ML model, and this then acts as a verifier to confirm the accuracy, or lack thereof, of the results from an ML model, and goes on further to explain the results from the model. This architecture is intended to be model-agnostic, so we evaluated its effectiveness using two NLP tasks: classification and Named Entity Recognition (NER). For classification, the UMLS-based verifier was applied to the results from classifying the topographies in 1964 unstructured and anonymized breast cancer pathology reports by a Multi-Task Convolutional Neural Network (MT-CNN). For NER, the UMLS-based verifier was applied to the results of the HunFlair model on unstructured and anonymized breast, colon, and small intestine cancer pathology reports. Results For the classification evaluation, we found that an entity's National Cancer Institute term (NCIt) code can be used to obtain a topographical range for individual entities in a pathology report. We found, further, that, whilst there are entities whose topographical range contributes positively towards a report's overall topography classification, there are also entities that contribute negatively, and that the number of these negative-contributing entities is inversely proportional to the confidence value from the ML model. For the NER evaluation, we found that the UMLS-based verifier is able to both confirm accurate model annotations, and group together the different kinds of inaccuracies found. Additionally, the grouping of an incorrectly tagged entity was found to be correlated with lower confidence values from the model. Conclusion The architecture we propose retrospectively verifies and explains the results of ML models, thus providing a level of interpretability to a model’s outputs. Our use of an industry standard healthcare knowledge repository, the UMLS, is an important contribution towards trusting the results of AI systems in healthcare. Citation Format: Joan Byamugisha, Waheeda Saib, Theodore Gaelejwe, Asad Jeewa, Maletsabisa Molapo. Towards verifying results from biomedical deep learning models using the UMLS: Cases of primary tumor site classification and cancer Named Entity Recognition [abstract]. In: Proceedings of the AACR Virtual Special Conference on Artificial Intelligence, Diagnosis, and Imaging; 2021 Jan 13-14. Philadelphia (PA): AACR; Clin Cancer Res 2021;27(5_Suppl):Abstract nr PR-12.