Frame Segmentation Research Articles

Medical technology for minimally invasive surgery has undergone a paradigm shift with the introduction of robot-assisted surgery. However, it is very difficult to track the position of the surgical tools in a surgical scene, so it is crucial to accurately detect and identify surgical tools. This task can be aided by deep learning-based semantic segmentation of surgical video frames. Furthermore, due to the limited working and viewing areas of these surgical instruments, there is a higher chance of complications from tissue injuries (e.g., tissue scars and tears). With the aid of digital inpainting algorithms, we present an application that uses image segmentation to remove surgical instruments from laparoscopic/endoscopic video. We employ a modified U-Net architecture (U-NetPlus) to segment the surgical instruments. It consists of a redesigned decoder and a pre-trained VGG11 or VGG16 encoder. The decoder was modified by substituting an up-sampling operation based on nearest-neighbor interpolation for the transposed convolution operation. Furthermore, these interpolation weights do not need to be learned to perform upsampling, which eliminates the artifacts generated by the transposed convolution. In addition, we use a very fast and adaptable data augmentation technique to further enhance performance. The instrument segmentation mask is filled in (i.e., inpainted) by the tool removal algorithms using the previously acquired tool segmentation masks and either previous instrument-containing frames or instrument-free reference frames. We have shown the effectiveness of the proposed surgical tool segmentation/removal algorithms on a robotic instrument dataset from the MICCAI 2015 and 2017 EndoVis Challenge. We report a 90.20% DICE for binary segmentation, a 76.26% DICE for instrument part segmentation, and a 46.07% DICE for instrument type (i.e., all instruments) segmentation on the MICCAI 2017 challenge dataset using our U-NetPlus architecture, outperforming the results of earlier techniques used and tested on these data. In addition, we demonstrated the successful execution of the tool removal algorithm from surgical tool-free videos that contained moving surgical tools that were generated artificially. Our application successfully separates and eliminates the surgical tool to reveal a view of the background tissue that was otherwise hidden by the tool, producing results that are visually similar to the actual data.

Read full abstract

In the day-to-day life of communities, good communication channels are crucial for mutual understanding. The hearing-impaired community uses sign language, which is a visual and gestural language. In terms of orientation and expression, it is separate from written and spoken languages. Despite the fact that sign language is an excellent platform for communication among hearing-impaired persons, it has created a communication barrier between hearing-impaired and non-disabled people. To address this issue, researchers have proposed sign language to text translation systems for English and other European languages as a solution. The goal of this research is to design and develop an Amharic digital text converter system using Ethiopian sign language. The proposed system was created with the help of two key deep learning algorithms: a pretrained deep learning model and a Long Short-Term Memory (LSTM). The LSTM was used to extract sequence information from a sequence of image frames of a specific sign language, while the pretrained deep learning model was used to extract features from single frame images. The dataset used to train the algorithms was gathered in video format from Addis Ababa University. Prior to feeding the obtained dataset to the deep learning models, data preprocessing activities such as cleaning and video to image frame segmentation were conducted. The system was trained, validated, and tested using 80%, 10%, and 10% of the 2475 images created during the preprocessing step. Two pretrained deep learning models, EfficientNetB0 and ResNet50, were used in this investigation, and they attained an accuracy of 72.79%. In terms of precision and f1-score, ResNet50 outperformed EfficientNetB0. For the proposed system, a graphical user interface prototype was created, and the best performing model was chosen and implemented. The proposed system can be utilized as a starting point for other researchers to improve upon, based on the outcomes of the experiment. More high-quality training datasets and high-performance training machines, such as GPU-enabled computers, can be added to the system to improve it.

Read full abstract

Frame Segmentation Research Articles

Related Topics

Articles published on Frame Segmentation

Inpainting surgical occlusion from laparoscopic video sequences for robot-assisted interventions.

Crane payload localisation for curtain wall installation: A markerless computer vision approach

A Four-Point Camera Calibration Method for Sport Videos

Complementary Coarse-to-Fine Matching for Video Object Segmentation

Fully automatic tracking of native glenohumeral kinematics from stereo-radiography

Defect detection and response non-uniformity correction of a monocentric camera based on fiber optic relay imaging.

Eye-Blink Event Detection Using a Neural-Network-Trained Frame Segment for Woman Drivers in Saudi Arabia

A frame orientation optimisation method for consistent interpretation of kinematic signals

Abnormal event detection model using an improved ResNet101 in context aware surveillance system

Attribution and the discourse structure of reports

Global video object segmentation with spatial constraint module

Enhanced security using video summarization for surveillance system using deep LSTM model with K-means clustering technique

Features of forming process of frame segments blanks based on stretch bending technology of extruded section made of high-strength aluminum alloys

An Efficient Attention-Based Strategy for Anomaly Detection in Surveillance Video

LED Screen-Based Intelligent Hand Gesture Recognition System

Coherence-aware context aggregator for fast video object segmentation

Classification of Valvular Regurgitation Using Echocardiography

A coarse-to-fine segmentation frame for polyp segmentation via deep and classification features

Saying the Unseen: Video Descriptions via Dialog Agents.

A Generic Approach towards Amharic Sign Language Recognition

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Frame Segmentation Research Articles

Related Topics

Articles published on Frame Segmentation

Inpainting surgical occlusion from laparoscopic video sequences for robot-assisted interventions.

Crane payload localisation for curtain wall installation: A markerless computer vision approach

A Four-Point Camera Calibration Method for Sport Videos

Complementary Coarse-to-Fine Matching for Video Object Segmentation

Fully automatic tracking of native glenohumeral kinematics from stereo-radiography

Defect detection and response non-uniformity correction of a monocentric camera based on fiber optic relay imaging.

Eye-Blink Event Detection Using a Neural-Network-Trained Frame Segment for Woman Drivers in Saudi Arabia

A frame orientation optimisation method for consistent interpretation of kinematic signals

Abnormal event detection model using an improved ResNet101 in context aware surveillance system

Attribution and the discourse structure of reports

Global video object segmentation with spatial constraint module

Enhanced security using video summarization for surveillance system using deep LSTM model with K-means clustering technique

Features of forming process of frame segments blanks based on stretch bending technology of extruded section made of high-strength aluminum alloys

An Efficient Attention-Based Strategy for Anomaly Detection in Surveillance Video

LED Screen-Based Intelligent Hand Gesture Recognition System

Coherence-aware context aggregator for fast video object segmentation

Classification of Valvular Regurgitation Using Echocardiography

A coarse-to-fine segmentation frame for polyp segmentation via deep and classification features

Saying the Unseen: Video Descriptions via Dialog Agents.

A Generic Approach towards Amharic Sign Language Recognition