Image Modalities Research Articles

BackgroundFluorescence microscopy (FM) is an important and widely adopted biological imaging technique. Segmentation is often the first step in quantitative analysis of FM images. Deep neural networks (DNNs) have become the state-of-the-art tools for image segmentation. However, their performance on natural images may collapse under certain image corruptions or adversarial attacks. This poses real risks to their deployment in real-world applications. Although the robustness of DNN models in segmenting natural images has been studied extensively, their robustness in segmenting FM images remains poorly understoodResultsTo address this deficiency, we have developed an assay that benchmarks robustness of DNN segmentation models using datasets of realistic synthetic 2D FM images with precisely controlled corruptions or adversarial attacks. Using this assay, we have benchmarked robustness of ten representative models such as DeepLab and Vision Transformer. We find that models with good robustness on natural images may perform poorly on FM images. We also find new robustness properties of DNN models and new connections between their corruption robustness and adversarial robustness. To further assess the robustness of the selected models, we have also benchmarked them on real microscopy images of different modalities without using simulated degradation. The results are consistent with those obtained on the realistic synthetic images, confirming the fidelity and reliability of our image synthesis method as well as the effectiveness of our assay.ConclusionsBased on comprehensive benchmarking experiments, we have found distinct robustness properties of deep neural networks in semantic segmentation of FM images. Based on the findings, we have made specific recommendations on selection and design of robust models for FM image segmentation.

Analyzing, manipulating, and comprehending data from multiple sources (e.g., websites, software applications, files, or databases) and of diverse modalities (e.g., video, images, audio and text) has become increasingly important in many domains. Despite recent advances in multimodal classification (MC), there are still several challenges to be addressed, such as: the combination of modalities of very diverse nature, the optimal feature engineering for each modality, as well as the semantic alignment between text and images. Accordingly, the main motivation of our research relies in devising a neural architecture that effectively processes and combines text, image, video and audio modalities, so it can offer a noteworthy performance in different MC tasks. In this regard, the Multimodal Transformer (MulT) model is a cutting-edge approach often employed in multimodal supervised tasks, which, although effective, has the problem of having a fixed architecture that limits its performance in specific tasks as well as its contextual understanding, meaning it may struggle to capture fine-grained temporal patterns in audio or effectively model spatial relationships in images. To address these issues, our research modifies and extends the MulT model in several aspects. Firstly, we focus on leveraging the Gated Multimodal Unit (GMU) module within the architecture to efficiently and dynamically weigh modalities at the instance level and to visualize the use of modalities. Secondly, to overcome the problem of vanishing and exploding gradients we focus on strategically placing residual connections in the architecture. The proposed architecture is evaluated in two different and complex classification tasks, on the one hand, the movie genre categorization (MGC) and, on the other hand, the multimodal emotion recognition (MER). The results obtained are encouraging as they indicate that the proposed architecture is competitive against state-of-the-art (SOTA) models in MGC, outperforming them by up to 2% on the Moviescope dataset, and by 1% on the MM-IMDB datasets. Furthermore, in the MER task the unaligned version of the datasets was employed, which is considerably more difficult; we improve accuracy SOTA results by up to 1% on the IEMOCAP dataset, and attained a competitive outcome on the CMU-MOSEI11Dai et al., (2021) collection, outperforming SOTA results in several emotions.

Image Modalities Research Articles

Related Topics

Articles published on Image Modalities

Few-Shot Image Classification of Crop Diseases Based on Vision-Language Models.

MEEAFusion: Multi-Scale Edge Enhancement and Joint Attention Mechanism Based Infrared and Visible Image Fusion.

CRML-Net: Cross-Modal Reasoning and Multi-Task Learning Network for tooth image segmentation

GLFuse: A Global and Local Four-Branch Feature Extraction Network for Infrared and Visible Image Fusion

Semi-supervised Cross-modal Hashing with Joint Hyperboloid Mapping

Multimodal Image Confidence: A Novel Method for Tumor and Organ Boundary Representation

FCLFusion: A frequency-aware and collaborative learning for infrared and visible image fusion

Quaternion-based 2D-DOST and stacked principal component analysis network for multimodal face recognition

A New Approach for Effective Retrieval of Medical Images: A Step towards Computer-Assisted Diagnosis.

BCMFIFuse: A Bilateral Cross-Modal Feature Interaction-Based Network for Infrared and Visible Image Fusion

A systematic evaluation of computational methods for cell segmentation.

Military Image Captioning for Low-Altitude UAV or UGV Perspectives

Benchmarking robustness of deep neural networks in semantic segmentation of fluorescence microscopy images

Enhanced multimodal medical image fusion via modified DWT with arithmetic optimization algorithm

The effect of flow-derived mechanical cues on the growth and morphology of platelet aggregates under low, medium, and high shear rates

Automatic movie genre classification & emotion recognition via a BiProjection Multimodal Transformer

Progressive discrepancy elimination for visible–infrared person re-identification

A multi-task framework based on decomposition for multimodal named entity recognition

Cross-Modality Medical Image Segmentation via Enhanced Feature Alignment and Cross Pseudo Supervision Learning.

Underground coal gangue recognition based on composite fusion of feature and decision

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Image Modalities Research Articles

Related Topics

Articles published on Image Modalities

Few-Shot Image Classification of Crop Diseases Based on Vision-Language Models.

MEEAFusion: Multi-Scale Edge Enhancement and Joint Attention Mechanism Based Infrared and Visible Image Fusion.

CRML-Net: Cross-Modal Reasoning and Multi-Task Learning Network for tooth image segmentation

GLFuse: A Global and Local Four-Branch Feature Extraction Network for Infrared and Visible Image Fusion

Semi-supervised Cross-modal Hashing with Joint Hyperboloid Mapping

Multimodal Image Confidence: A Novel Method for Tumor and Organ Boundary Representation

FCLFusion: A frequency-aware and collaborative learning for infrared and visible image fusion

Quaternion-based 2D-DOST and stacked principal component analysis network for multimodal face recognition

A New Approach for Effective Retrieval of Medical Images: A Step towards Computer-Assisted Diagnosis.

BCMFIFuse: A Bilateral Cross-Modal Feature Interaction-Based Network for Infrared and Visible Image Fusion

A systematic evaluation of computational methods for cell segmentation.

Military Image Captioning for Low-Altitude UAV or UGV Perspectives

Benchmarking robustness of deep neural networks in semantic segmentation of fluorescence microscopy images

Enhanced multimodal medical image fusion via modified DWT with arithmetic optimization algorithm

The effect of flow-derived mechanical cues on the growth and morphology of platelet aggregates under low, medium, and high shear rates

Automatic movie genre classification & emotion recognition via a BiProjection Multimodal Transformer

Progressive discrepancy elimination for visible–infrared person re-identification

A multi-task framework based on decomposition for multimodal named entity recognition

Cross-Modality Medical Image Segmentation via Enhanced Feature Alignment and Cross Pseudo Supervision Learning.

Underground coal gangue recognition based on composite fusion of feature and decision