Enhanced descriptive captioning model for histopathological patches

Samar Elbedwehy,T Medhat,Taher Hamza,Mohammed F Alrahmawy

doi:10.1007/s11042-023-15884-y

Samar Elbedwehy, T Medhat + Show 2 more

Open Access

https://doi.org/10.1007/s11042-023-15884-y

Copy DOI

Abstract

AbstractThe interpretation of medical images into a natural language is a developing field of artificial intelligence (AI) called image captioning. This field integrates two branches of artificial intelligence which are computer vision and natural language processing. This is a challenging topic that goes beyond object recognition, segmentation, and classification since it demands an understanding of the relationships between various components in an image and how these objects function as visual representations. The content-based image retrieval (CBIR) uses an image captioning model to generate captions for the user query image. The common architecture of medical image captioning systems consists mainly of an image feature extractor subsystem followed by a caption generation lingual subsystem. We aim in this paper to build an optimized model for histopathological captions of stomach adenocarcinoma endoscopic biopsy specimens. For the image feature extraction subsystem, we did two evaluations; first, we tested 5 different vision models (VGG, ResNet, PVT, SWIN-Large, and ConvNEXT-Large) using (LSTM, RNN, and bidirectional-RNN) and then compare the vision models with (LSTM-without augmentation, LSTM-with augmentation and BioLinkBERT-Large as an embedding layer-with augmentation) to find the accurate one. Second, we tested 3 different concatenations of pairs of vision models (SWIN-Large, PVT_v2_b5, and ConvNEXT-Large) to get among them the most expressive extracted feature vector of the image. For the caption generation lingual subsystem, we tested a pre-trained language embedding model which is BioLinkBERT-Large compared to LSTM in both evaluations, to select from them the most accurate model. Our experiments showed that building a captioning system that uses a concatenation of the two models ConvNEXT-Large and PVT_v2_b5 as an image feature extractor, combined with the BioLinkBERT-Large language embedding model produces the best results among the other combinations.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Multimedia Tools and Applications	Publication Date: Jun 1, 2023
Citations: 1	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

Enhanced descriptive captioning model for histopathological patches

Abstract

Talk to us

Similar Papers

More From: Multimedia Tools and Applications

Lead the way for us

Similar Papers

Deep Learning in Natural Language Generation from Images
Xiaodong He ... Li Deng
-
Xiaodong He, et. al.Xiaodong He ... Li Deng
01 Jan 2018
01 Jan 2018

Computer Vision and Natural Language Processing
Peratham Wiriyathammabhum ... Yiannis Aloimonos
ACM Computing Surveys | VOL. 49
Peratham Wiriyathammabhum, et. al.Peratham Wiriyathammabhum ... Yiannis Aloimonos
12 Dec 2016
ACM Computing Surveys | VOL. 49

Exploring better image captioning with grid features
Jie Yan ... Yanming Guo
Complex & Intelligent Systems | VOL. 10
Jie Yan, et. al.Jie Yan ... Yanming Guo
10 Feb 2024
Complex & Intelligent Systems | VOL. 10

Chinese Image Caption Generation via Visual Attention and Topic Modeling.
Maofu Liu ... Lingjun Li
IEEE Transactions on Cybernetics | VOL. 52
Maofu Liu, et. al.Maofu Liu ... Lingjun Li
22 Jun 2020
IEEE Transactions on Cybernetics | VOL. 52

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Enhanced descriptive captioning model for histopathological patches

Abstract

Talk to us

Similar Papers

More From: Multimedia Tools and Applications