Abstract
An edge map is a feature map representing the contours of the object in the image. There was a Single Image Super Resolution (SISR) method using the edge map, which achieved a notable SSIM performance improvement. Unlike SISR, Video Super Resolution (VSR) uses video, which consists of consecutive images with temporal features. Therefore, some VSR models adopted motion estimation and motion compensation to apply spatio-temporal feature maps. Unlike the models above, we tried a different method by adding edge structure information and its related post-processing to the existing model. Our model “Video Super Resolution Using a Selective Edge Aggregation Network (SEAN)” consists of a total of two stages. First, the model selectively generates an edge map using the target frame and also the neighboring frame. At this stage, we adopt the magnitude loss function so that the output of SEAN more clearly learns the contours of each object. Second, the final output is generated using the refinement (post-processing) module. SEAN shows more distinct object contours and better color correction compared to other existing models.
Highlights
The Video Super Resolution (VSR) task is usually processed using information from consecutive frames
In the case of a neighboring frame with little change in motion, there is no significant difference in the edge maps obtained in each frame
The proposed refinement was applied to the VSR (Video Super Resolution) field for the first time, and since it was designed with high versatility, it is compatible with other VSR models according to the user’s choice
Summary
The VSR task is usually processed using information from consecutive frames. To use this information, some VSR models [1,2] adopted motion estimation and motion compensation to apply spatio-temporal features. Before generating an edge map, it is necessary to detect the motion difference between neighboring frames. In the case of a neighboring frame with little change in motion, there is no significant difference in the edge maps obtained in each frame. Our model uses Canny edge detection to generate an edge map of each frame, except blur frames. Because artifacts exist in some frames of the TestSet, such as Vid4 [3], incomplete areas are generated To solve this problem, there has been an attempt to use correlation between frames [4]. This process looks similar to the refining process in the classic vision field called post-processing, but it has a different character It uses deep learning, and in particular, it is possible to minimize information loss, a classic problem of convolution operation. The proposed refinement was applied to the VSR (Video Super Resolution) field for the first time, and since it was designed with high versatility, it is compatible with other VSR models according to the user’s choice
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.