Abstract

AbstractThe interpretation of medical images into a natural language is a developing field of artificial intelligence (AI) called image captioning. This field integrates two branches of artificial intelligence which are computer vision and natural language processing. This is a challenging topic that goes beyond object recognition, segmentation, and classification since it demands an understanding of the relationships between various components in an image and how these objects function as visual representations. The content-based image retrieval (CBIR) uses an image captioning model to generate captions for the user query image. The common architecture of medical image captioning systems consists mainly of an image feature extractor subsystem followed by a caption generation lingual subsystem. We aim in this paper to build an optimized model for histopathological captions of stomach adenocarcinoma endoscopic biopsy specimens. For the image feature extraction subsystem, we did two evaluations; first, we tested 5 different vision models (VGG, ResNet, PVT, SWIN-Large, and ConvNEXT-Large) using (LSTM, RNN, and bidirectional-RNN) and then compare the vision models with (LSTM-without augmentation, LSTM-with augmentation and BioLinkBERT-Large as an embedding layer-with augmentation) to find the accurate one. Second, we tested 3 different concatenations of pairs of vision models (SWIN-Large, PVT_v2_b5, and ConvNEXT-Large) to get among them the most expressive extracted feature vector of the image. For the caption generation lingual subsystem, we tested a pre-trained language embedding model which is BioLinkBERT-Large compared to LSTM in both evaluations, to select from them the most accurate model. Our experiments showed that building a captioning system that uses a concatenation of the two models ConvNEXT-Large and PVT_v2_b5 as an image feature extractor, combined with the BioLinkBERT-Large language embedding model produces the best results among the other combinations.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.