In this paper, a framework based on two feature extraction networks and a multilevel feature fusion (MFF) network is proposed. Multilevel degradation features can be obtained through this method, and combined with the human visual perception system, the local and global feature information contained in these features can be captured, which is conducive to the prediction of distorted images. First, a restored image approximating a reference image is generated by a restorative generative adversarial network (GAN). Furthermore, the multilevel degradation features of distorted images and the restored image features are extracted by EfficientNet. Second, the features extracted by EfficientNet are input into the MFF network and are fully expressed by the top-down, bottom-up and third edge joining methods. Moreover, the features provide more low-level details and high-level semantic features for the prediction of image quality scores. In addition, after the MFF stage, the framework calculates the score of each branch feature and obtains the average quality score. Experimental results show that our method achieves greatly improved prediction accuracy and performance on five standard databases.
Read full abstract