Abstract

User-Generated Content has millions of creators and viewers, with platforms like YouTube allowing amateurs to share casual videos. However, since they are normally recorded with mobile phones by non-professionals, their viewing experience tends to be low due to artifacts such as jitter. In this scenario, Digital Video Stabilization (DVS) has been utilized, to create high quality, professional level videos. In the industry and academia, there are a number of traditional and Deep Learning based DVS systems, however both approaches have limitations: the former struggles to extract and track features in a number of scenarios, and the latter struggles with camera path smoothing, a hard problem to define in this context. In this paper, we utilize the strengths of Deep Learning and traditional methods for video stabilization, by combining Spatial Transformer Networks and Exponentially Weighted Moving Averages. To avoid distortion and blur from 2D transformations, we simulate translations in the x and y axis by padding the edges of the frames. Experimental results show that our system outperforms the state of the art proposals and one commercial solution in a wide variety of scene contents and video categories.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call