Considering that the human brain always follows a coarse-to-fine (low-to-high spatial frequency) visual processing and fusion mechanism, we propose a coarse-to-fine feedback guidance based stereo image quality assessment (SIQA) network which considers a coarse-to-fine feedback guidance and adaptive dominant eye mechanism. The proposed network consists of two main sub-network streams, each of which has three branches to extract low, middle and high spatial frequency information in parallel. To better realize the guidance of the high-level features in the low spatial frequency branch to the low-level features in the high spatial frequency branch, an information feedback guidance module (IFGM) is proposed, which realizes a top-down guidance mechanism in each sub-network stream. Simultaneously, according to the theory of ocular dominance in human visual system (HVS), we design an adaptive bi-directional parallax-based binocular fusion module (BPBFM), which synthesizes two types of fusion feature by taking the left and right view features as dominant eye input. Furthermore, in order to obtain the better perceptual quality of stereo images, we design a weighted fusion strategy to weigh the quality scores from the two types of fusion features obtained by using an ensemble model with two multi-layer perceptrons (MLPs). The experimental results on four public stereo image datasets show that the proposed method is superior to the mainstream metrics and achieves an excellent performance.
Read full abstract