Video stabilization is a challenging task that attempts to compensate for the overall frame shake during video acquisition. Existing three-dimensional video stabilization methods aim at modeling camera perspective projection through either data-driven training or explicit motion estimation. However, the above methods are difficult to effectively solve the issue of shaky videos with abrupt object movements, resulting in local motion blur in the direction of the movement. This phenomenon is prevalent in real-world scenarios featuring foreground blind motion scenes. Unfortunately, directly combining stabilization and deblurring methods poses challenges when dealing with this situation. In the video, the intensity of motion blur undergoes continuous changes, and the direct combination method inadequately utilizes spatiotemporal information, providing insufficient clues for cross-frame compensation. To alleviate this problem, the Cross-frame-temporal Module framework is proposed to address blind motion blur induced by various conditions, which utilizes cross-frame temporal features to estimate depth maps and camera motion. In this framework, a Blur Transform Network (BTNet) is designed to adapt to spatially varying motion blur, which transforms local regions according to the impact of blur intensities to adapt to the effects of non-uniform motion blur; furthermore, our Temporal-Aware Network (TANet) further suppresses motion blur by leveraging cross-frame temporal features. In addition, the limited availability of pair-training video data containing motion blur limits the application of this approach in practice. The Cross-frame-temporal Module framework adopts an un-pretrained in-test training strategy. Extensive experimental results have demonstrated that our method outperforms state-of-the-art methods.
Read full abstract