Abstract

The recently emerging event camera has grown into a new type of sensor in the realm of vision, with benefits such as low power consumption, high dynamic range (HDR), microsecond time resolution, and no motion blur. While event cameras offer numerous advantages over conventional cameras, they only capture changes in intensity and give up lots of environmental details. This paper proposes an end-to-end UNet network called SCSE-E2VID to synthesize gray images from asynchronous events. We design an event fusion block to feed more related events to the encoder, allowing the network to extract more valuable features. The famous attention module called Spatial and Channel ‘Squeeze & Excitation’ Block (SCSE) is utilized to remove artifacts and better extract spatiotemporal features for the decoder. Besides, we add parallel convolutions in the upsampling block and refine the output features, which supplement content in reduced channels. In order to evaluate the performance of our proposed SCSE-E2VID, we implement quantitative and qualitative comparisons based on the public IJRR and HQF datasets. The results show that our method achieves better performance in terms of perceptual similarity and structural similarity when compared with state-of-art methods and demonstrates comparable performance in terms of squared error.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.