Abstract

With the rapid growth of IoT multimedia devices, more and more content is being delivered in multimedia forms which generally requires more effort and resources from users to create and could be challenging to streamline. In this paper, we propose Text2Animation (T2A), a framework that helps to generate complex multimedia content, and animation, from the simple textual script input. By leveraging recent advances in computational cinematography and video understanding, the purposed workflow further reduces associated knowledge requirements dramatically for cinematography. By jointly incorporating the fidelity and aesthetic models, T2A jointly considers the comprehensiveness of the visual presentation of the input script and the compliance of generated video with given cinematography specifications. The virtual camera placement in a 3D environment is mapped into an optimization problem that can be resolved by using dynamic programming to achieve the lowest computational complexity. Experimental results show that T2A can reduce the manual animation production process by around 74%, and the new optimization framework can improve the perceptual quality of the output video by up to 35%. More video footage can be found at https://www.youtube.com/watch?v=MMTJbmWL3gs.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call