Common Representation Research Articles

Most cross-modal retrieval methods assume the multi-modal training data is complete and has a one-to-one correspondence. However, in the real world, multi-modal data generally suffers from missing modality information due to the uncertainty of data collection and storage processes, which limits the practical application of existing cross-modal retrieval methods. Although some solutions have been proposed to generate the missing modality data using a single pseudo sample, this may lead to incomplete semantic restoration and sub-optimal retrieval results due to the limited semantic information it provides. To address this challenge, this article proposes an Incomplete Cross-Modal Retrieval with Deep Correlation Transfer (ICMR-DCT) method that can robustly model incomplete multi-modal data and dynamically capture the adjacency semantic correlation for cross-modal retrieval. Specifically, we construct intra-modal graph attention-based auto-encoder to learn modality-invariant representations by performing semantic reconstruction through intra-modality adjacency correlation mining. Then, we design dual cross-modal alignment constraints to project multi-modal representations into a common semantic space, thus bridging the heterogeneous modality gap and enhancing the discriminability of the common representation. We further introduce semantic preservation to enhance adjacency semantic information and achieve cross-modal semantic correlation. Moreover, we propose a nearest-neighbor weighting integration strategy with cross-modal correlation transfer to generate the missing modality data according to inter-modality mapping relations and adjacency correlations between each sample and its neighbors, which improves the robustness of our method against incomplete multi-modal training data. Extensive experiments on three widely tested benchmark datasets demonstrate the superior performance of our method in cross-modal retrieval tasks under both complete and incomplete retrieval scenarios. Our used datasets and source codes are available at https://github.com/shidan0122/DCT.git .

Read full abstract

There is a wide application of deep learning technique to unimodal medical image analysis with significant classification accuracy performance observed. However, real-world diagnosis of some chronic diseases such as breast cancer often require multimodal data streams with different modalities of visual and textual content. Mammography, magnetic resonance imaging (MRI) and image-guided breast biopsy represent a few of multimodal visual streams considered by physicians in isolating cases of breast cancer. Unfortunately, most studies applying deep learning techniques to solving classification problems in digital breast images have often narrowed their study to unimodal samples. This is understood considering the challenging nature of multimodal image abnormality classification where the fusion of high dimension heterogeneous features learned needs to be projected into a common representation space. This paper presents a novel deep learning approach combining a dual/twin convolutional neural network (TwinCNN) framework to address the challenge of breast cancer image classification from multi-modalities. First, modality-based feature learning was achieved by extracting both low and high levels features using the networks embedded with TwinCNN. Secondly, to address the notorious problem of high dimensionality associated with the extracted features, binary optimization method is adapted to effectively eliminate non-discriminant features in the search space. Furthermore, a novel method for feature fusion is applied to computationally leverage the ground-truth and predicted labels for each sample to enable multimodality classification. To evaluate the proposed method, digital mammography images and digital histopathology breast biopsy samples from benchmark datasets namely MIAS and BreakHis respectively. Experimental results obtained showed that the classification accuracy and area under the curve (AUC) for the single modalities yielded 0.755 and 0.861871 for histology, and 0.791 and 0.638 for mammography. Furthermore, the study investigated classification accuracy resulting from the fused feature method, and the result obtained showed that 0.977, 0.913, and 0.667 for histology, mammography, and multimodality respectively. The findings from the study confirmed that multimodal image classification based on combination of image features and predicted label improves performance. In addition, the contribution of the study shows that feature dimensionality reduction based on binary optimizer supports the elimination of non-discriminant features capable of bottle-necking the classifier.

Read full abstract

Common Representation Research Articles

Related Topics

Articles published on Common Representation

JORA: Weakly Supervised User Identity Linkage via Jointly Learning to Represent and Align.

RDPGNet: A road extraction network with dual-view information perception based on GCN

Learning attentional templates for value-based decision-making

A semi-supervised cross-modal memory bank for cross-modal retrieval

IF THIS THEN THAT Broken Linear Logic. Rethinking and Representing the Design Process

Want a More Effective School? Better Start with the Culture

Is the “Common Home” Metaphor Adequate and Useful for an “Integral Ecology” Theology in Modern Times?

Asymmetric Supervised Fusion-Oriented Hashing for Cross-Modal Retrieval.

Robust detection of clinically relevant features in single-cell RNA profiles of patient-matched fresh and formalin-fixed paraffin-embedded (FFPE) lung cancer tissue

Knowledge enhancement for speech emotion recognition via multi-level acoustic feature

CMRsim-A python package for cardiovascular MR simulations incorporating complex motion and flow.

Incomplete Cross-Modal Retrieval with Deep Correlation Transfer

Cortical quantity representations of visual numerosity and timing overlap increasingly into superior cortices but remain distinct

A twin convolutional neural network with hybrid binary optimizer for multimodal breast cancer digital image classification

Turning the Light Switch on Binding: Prefrontal Activity for Binding and Retrieval in Action Control.

Dynamic 3D Point Cloud Sequences as 2D Videos.

The importance of observing the master's hand: Action Observation Training promotes the acquisition of new musical skills.

Efficient Communication via Self-Supervised Information Aggregation for Online and Offline Multiagent Reinforcement Learning.

Subgraph Propagation and Contrastive Calibration for Incomplete Multiview Data Clustering.

CCGIB: A Cross-Channel Graph Information Bottleneck Principle.

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Common Representation Research Articles

Related Topics

Articles published on Common Representation

JORA: Weakly Supervised User Identity Linkage via Jointly Learning to Represent and Align.

RDPGNet: A road extraction network with dual-view information perception based on GCN

Learning attentional templates for value-based decision-making

A semi-supervised cross-modal memory bank for cross-modal retrieval

IF THIS THEN THAT Broken Linear Logic. Rethinking and Representing the Design Process

Want a More Effective School? Better Start with the Culture

Is the “Common Home” Metaphor Adequate and Useful for an “Integral Ecology” Theology in Modern Times?

Asymmetric Supervised Fusion-Oriented Hashing for Cross-Modal Retrieval.

Robust detection of clinically relevant features in single-cell RNA profiles of patient-matched fresh and formalin-fixed paraffin-embedded (FFPE) lung cancer tissue

Knowledge enhancement for speech emotion recognition via multi-level acoustic feature

CMRsim-A python package for cardiovascular MR simulations incorporating complex motion and flow.

Incomplete Cross-Modal Retrieval with Deep Correlation Transfer

Cortical quantity representations of visual numerosity and timing overlap increasingly into superior cortices but remain distinct

A twin convolutional neural network with hybrid binary optimizer for multimodal breast cancer digital image classification

Turning the Light Switch on Binding: Prefrontal Activity for Binding and Retrieval in Action Control.

Dynamic 3D Point Cloud Sequences as 2D Videos.

The importance of observing the master's hand: Action Observation Training promotes the acquisition of new musical skills.

Efficient Communication via Self-Supervised Information Aggregation for Online and Offline Multiagent Reinforcement Learning.

Subgraph Propagation and Contrastive Calibration for Incomplete Multiview Data Clustering.

CCGIB: A Cross-Channel Graph Information Bottleneck Principle.