Encoding laparoscopic image to words using vision transformer for distortion classification and ranking in laparoscopic videos

Nouar AlDahoul,Hezerul Abdul Karim,Myles Joshua Toledo Tan,Mhd Adel Momo,Jamie Ledesma Fermin

doi:10.1007/s11042-024-19089-9

Abstract

AbstractLaparoscopic videos are tools used by surgeons to insert narrow tubes into the abdomen and keep the skin without large incisions. The videos captured by a camera are prone to numerous distortions such as uneven illumination, motion blur, defocus blur, smoke, and noise which have impact on visual quality. Automatic detection and identification of distortions are significant to enhance the quality of laparoscopic videos to avoid errors during surgery. The video quality assessment includes two stages: classification of distortions affecting the video frames to identify their types and ranking of distortions to estimate the intensity levels. The dataset generated in ICIP2020 challenge including laparoscopic videos was utilized for training, validation, and testing the proposed solution. The difficulty of this dataset is caused by having five categories of distortions and four levels of severity. Additionally, the availability of multiple distortion categories in one video is considered the most challenging part of this dataset. The work presented in this paper contributes to solve the multi-label distortion classification and ranking problem. This paper aims to enhance the performance of distortion classification solutions. Vision transformer which is a deep learning model was used to extract informative features by transferring learning and representation from the general domain to the medical domain (laparoscopic videos). Additionally, six parallel multilayer perceptron (MLP) classifiers were added and attached to vision transformer for distortion classification and ranking. The experiment showed that the proposed solution outperforms existing distortion classification methods in terms of average accuracy (89.7%), average single distortion F1 score (94.18%), and average of both single and multiple distortions F1 score (96.86%). Moreover, it can also rank the distortions with an average accuracy of 79.22% and average F1 score of 78.44%. Hence, the high performance of the method proposed in this paper opens the door to integrate our solution in the intelligent video enhancement system.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Encoding laparoscopic image to words using vision transformer for distortion classification and ranking in laparoscopic videos

Abstract

Talk to us

Similar Papers

More From: Multimedia Tools and Applications

Lead the way for us

Journal: Multimedia Tools and Applications	Publication Date: Apr 23, 2024
License type: CC BY 4.0

Similar Papers

Transfer Learning and Decision Fusion for Real Time Distortion Classification in Laparoscopic Videos
Nouar Aldahoul ... Jamie Ledesma Fermin
IEEE Access | VOL. 9
Nouar Aldahoul, et. al.Nouar Aldahoul ... Jamie Ledesma Fermin
01 Jan 2020
IEEE Access | VOL. 9

Spatio-temporal deep learning model for distortion classification in laparoscopic video
Nouar AlDahoul ... Myles Joshua Toledo Tan
F1000Research | VOL. 10
Nouar AlDahoul, et. al.Nouar AlDahoul ... Myles Joshua Toledo Tan
05 Oct 2021
F1000Research | VOL. 10

Synthetically Generating Motion Blur in a Depth Map from Time-of-Flight Sensors
Bryan Rodriguez ... Dinesh Rajan
-
Bryan Rodriguez, et. al.Bryan Rodriguez ... Dinesh Rajan
25 Jul 2021
25 Jul 2021

Probabilistic Modeling of Motion Blur for Time-of-Flight Sensors.
Bryan Rodriguez ... Dinesh Rajan
Sensors (Basel, Switzerland) | VOL. 22
Bryan Rodriguez, et. al.Bryan Rodriguez ... Dinesh Rajan
04 Feb 2022
Sensors (Basel, Switzerland) | VOL. 22

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Encoding laparoscopic image to words using vision transformer for distortion classification and ranking in laparoscopic videos

Abstract

Talk to us

Similar Papers

More From: Multimedia Tools and Applications