Still-Moving: Customized Video Generation without Customized Video Data

Hila Chefer,Shiran Zada,Roni Paiss,Ariel Ephrat,Omer Tov,Michael Rubinstein,Lior Wolf,Tali Dekel,Tomer Michaeli,Inbar Mosseri

doi:10.1145/3687945

Abstract

Customizing text-to-image (T2I) models has seen tremendous progress recently, particularly in areas such as personalization, stylization, and conditional generation. However, expanding this progress to video generation is still in its infancy, primarily due to the lack of customized video data. In this work, we introduce Still-Moving, a novel generic framework for customizing a text-to-video (T2V) model, without requiring any customized video data. The framework applies to the prominent T2V design where the video model is built over a T2I model (e.g., via inflation). We assume access to a customized version of the T2I model, trained only on still image data (e.g., using DreamBooth). Naively plugging in the weights of the customized T2I model into the T2V model often leads to significant artifacts or insufficient adherence to the customization data. To overcome this issue, we train lightweight Spatial Adapters that adjust the features produced by the injected T2I layers. Importantly, our adapters are trained on "frozen videos" (i.e., repeated images), constructed from image samples generated by the customized T2I model. This training is facilitated by a novel Motion Adapter module, which allows us to train on such static videos while preserving the motion prior of the video model. At test time, we remove the Motion Adapter modules and leave in only the trained Spatial Adapters. This restores the motion prior of the T2V model while adhering to the spatial prior of the customized T2I model. We demonstrate the effectiveness of our approach on diverse tasks including personalized, stylized, and conditional generation. In all evaluated scenarios, our method seamlessly integrates the spatial prior of the customized T2I model with a motion prior supplied by the T2V model.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Still-Moving: Customized Video Generation without Customized Video Data

Abstract

Talk to us

Similar Papers

More From: ACM Transactions on Graphics

Lead the way for us

Similar Papers

New Techniques in Interventions for Children with Autism Spectrum
Christos K Nikopoulos ... Evangelos Manolitsis
Autism-Open Access | VOL. 06
Christos K Nikopoulos, et. al.Christos K Nikopoulos ... Evangelos Manolitsis
01 Jan 2015
Autism-Open Access | VOL. 06

RipViz: Finding Rip Currents by Learning Pathline Behavior.
Akila De Silva ... Fahim Hasan Khan
IEEE transactions on visualization and computer graphics | VOL. 30
Akila De Silva, et. al.Akila De Silva ... Fahim Hasan Khan
01 Jul 2024
IEEE transactions on visualization and computer graphics | VOL. 30

Author response: A hardware system for real-time decoding of in vivo calcium imaging data
Zhe Chen ... Garrett J Blair
-
Zhe Chen, et. al.Zhe Chen ... Garrett J Blair
20 Jan 2023
20 Jan 2023

Virtual Changbai Mountain Scenery Display System Based on AR
Shuangshuang Guo ... Liguo Zheng
-
Shuangshuang Guo, et. al.Shuangshuang Guo ... Liguo Zheng
01 Aug 2014
01 Aug 2014

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Still-Moving: Customized Video Generation without Customized Video Data

Abstract

Talk to us

Similar Papers

More From: ACM Transactions on Graphics