Multiple Benchmark Research Articles

AbstractDetecting text within medical images presents a formidable challenge in the domain of computer vision due to the intricate nature of textual backgrounds, the dense text concentration, and the possible existence of extreme aspect ratios. This paper introduces an effective and precise text detection system tailored to address these challenges. The system incorporates an optimized segmentation module, a trainable post-processing method, and leverages a vision-language pre-training model (oCLIP). Specifically, our segmentation head integrates three essential components: the Feature Pyramid Network (FPN) module, which combines a residual structure and channel attention mechanism; the Efficient Feature Enhancement Module (EFEM); and the Multi-Scale Feature Fusion with RSEConv (MSFM-RSE), designed specifically for multi-scale feature fusion based on RSEConv. By introducing a residual structure and channel attention mechanism into the FPN module, the convolutional layers are replaced with RSEConv layers that employ a channel attention mechanism, further augmenting the representational capacity of the feature maps. The EFEM, designed as a cascaded U-shaped module, incorporates a spatial attention mechanism to introduce multi-level information, thereby enhancing segmentation performance. Subsequently, the MSFM-RSE adeptly amalgamates features from various depths and scales of the EFEM to generate comprehensive final features tailored for segmentation purposes. Additionally, a post-processing module employs a differentiable binarization strategy, allowing the segmentation network to dynamically determine the binarization threshold. Building on the system’s improvement, we introduce a vision-language pre-training model that undergoes extensive training on various visual language understanding tasks. This pre-trained model acquires detailed visual and semantic representations, further reinforcing both the accuracy and robustness in text detection when integrated with the segmentation module. The performance of our proposed model was evaluated through experiments on medical text image datasets, demonstrating excellent results. Multiple benchmark experiments validate its superior performance in comparison to existing methods. Code is available at: https://github.com/csworkcode/VLDBNet.

Read full abstract

Multiple Benchmark Research Articles

Related Topics

Articles published on Multiple Benchmark

Feature Distribution Matching by Optimal Transport for Effective and Robust Coreset Selection

ResDiff: Combining CNN and Diffusion Model for Image Super-resolution

Collaborative Consortium of Foundation Models for Open-World Few-Shot Learning

Self-Supervised Representation Learning with Meta Comprehensive Regularization

Offline Model-Based Optimization via Policy-Guided Gradient Search

FeatWalk: Enhancing Few-Shot Classification through Local View Leveraging

Multi-Cross Sampling and Frequency-Division Reconstruction for Image Compressed Sensing

Traffic pattern-aware elevator dispatching via deep reinforcement learning

CNN Based Face Emotion Recognition System for Healthcare Application

Multi-GPU work sharing in a task-based dataflow programming model

Cross-Parallel Attention and Efficient Match Transformer for Aerial Tracking

UGNet: Uncertainty aware geometry enhanced networks for stereo matching

Uncertainty-driven active developmental learning

Out-of-Domain Generalization From a Single Source: An Uncertainty Quantification Approach.

RAgE: Robust Age Estimation Through Subject Anchoring With Consistency Regularisation.

Organ-Agnostic Whole Slide Image Analysis using Self-Supervised Transfer Learning

MolFeSCue: enhancing molecular property prediction in data-limited and imbalanced contexts using few-shot and contrastive learning.

Robust Recommender Systems with Rating Flip Noise

Enhancing the Zebra Optimization Algorithm with Chaotic Sinusoidal Map for Versatile Optimization

Enhancing medical text detection with vision-language pre-training and efficient segmentation

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Multiple Benchmark Research Articles

Related Topics

Articles published on Multiple Benchmark

Feature Distribution Matching by Optimal Transport for Effective and Robust Coreset Selection

ResDiff: Combining CNN and Diffusion Model for Image Super-resolution

Collaborative Consortium of Foundation Models for Open-World Few-Shot Learning

Self-Supervised Representation Learning with Meta Comprehensive Regularization

Offline Model-Based Optimization via Policy-Guided Gradient Search

FeatWalk: Enhancing Few-Shot Classification through Local View Leveraging

Multi-Cross Sampling and Frequency-Division Reconstruction for Image Compressed Sensing

Traffic pattern-aware elevator dispatching via deep reinforcement learning

CNN Based Face Emotion Recognition System for Healthcare Application

Multi-GPU work sharing in a task-based dataflow programming model

Cross-Parallel Attention and Efficient Match Transformer for Aerial Tracking

UGNet: Uncertainty aware geometry enhanced networks for stereo matching

Uncertainty-driven active developmental learning

Out-of-Domain Generalization From a Single Source: An Uncertainty Quantification Approach.

RAgE: Robust Age Estimation Through Subject Anchoring With Consistency Regularisation.

Organ-Agnostic Whole Slide Image Analysis using Self-Supervised Transfer Learning

MolFeSCue: enhancing molecular property prediction in data-limited and imbalanced contexts using few-shot and contrastive learning.

Robust Recommender Systems with Rating Flip Noise

Enhancing the Zebra Optimization Algorithm with Chaotic Sinusoidal Map for Versatile Optimization

Enhancing medical text detection with vision-language pre-training and efficient segmentation