Abstract

This work presents a novel cross-modality method for acquiring human visual perception on stereoscopic image quality ranking (SIQR), aiming to learn latent biological representations and thinking tokens from brain activity. The core idea is to directly project electroencephalogram (EEG) information onto a SIQR neural network model, rather than simulate the complex neuronal connections. To the end, we establish a multimodal brain-visual dataset, and propose a Cross-GAN model including a binocular attention-based fusion (BAF) image modality encoder, a multiscale spatiotemporal (MST) EEG modality encoder and a multimodal feature generation GAN (MF-GAN). In the BAF image encoder, the progressive interactive attention fusion is designed to highlight the significant regions of the image from monocular view to binocular view, which contributes to the global information description of single view and fused view. In the MST EEG encoder, spectral norm normalization constraint is adopted to assure the Lipschitz continuity, and the multiscale convolution blocks is presented to capture EEG representations from spatiotemporal perspective. Finally, the monomodal image and EEG features are sent to the MF-GAN for creating the optimal brain-visual manifold under the control of the generative loss function. Extensive experiments on the brain-visual multimodality SIQR database prove that the CrossGAN can improve the performance via projecting brain response onto image evaluation model in a cross-modal manner, with no need of EEG trails in application.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call