Scene-Aware Activity Program Generation with Language Guidance

Zejia Su,Oliver Van Kaick,Hui Huang,Xuelin Chen,Qingnan Fan,Ruizhen Hu

doi:10.1145/3618338

Abstract

We address the problem of scene-aware activity program generation, which requires decomposing a given activity task into instructions that can be sequentially performed within a target scene to complete the activity. While existing methods have shown the ability to generate rational or executable programs, generating programs with both high rationality and executability still remains a challenge. Hence, we propose a novel method where the key idea is to explicitly combine the language rationality of a powerful language model with dynamic perception of the target scene where instructions are executed, to generate programs with high rationality and executability. Our method iteratively generates instructions for the activity program. Specifically, a two-branch feature encoder operates on a language-based and graph-based representation of the current generation progress to extract language features and scene graph features, respectively. These features are then used by a predictor to generate the next instruction in the program. Subsequently, another module performs the predicted action and updates the scene for perception in the next iteration. Extensive evaluations are conducted on the VirtualHome-Env dataset, showing the advantages of our method over previous work. Key algorithmic designs are validated through ablation studies, and results on other types of inputs are also presented to show the generalizability of our method.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Scene-Aware Activity Program Generation with Language Guidance

Abstract

Talk to us

Similar Papers

More From: ACM Transactions on Graphics

Lead the way for us

Similar Papers

Modeling Effects of Anticipated Time Pressure on Execution of Activity Programs
Harry Timmermans ... Theo Arentze
Transportation Research Record: Journal of the Transportation Research Board | VOL. 1752
Harry Timmermans, et. al.Harry Timmermans ... Theo Arentze
01 Jan 2001
Transportation Research Record: Journal of the Transportation Research Board | VOL. 1752

Approximating happens-before order: interplay between static analysis and state space traversal
Pavel Parízek ... Pavel Jančík
-
Pavel Parízek, et. al.Pavel Parízek ... Pavel Jančík
21 Jul 2014
21 Jul 2014

Dual Dictionary Learning for Mining a Unified Feature Subspace between Different Hyperspectral Image Scenes
Hong Chen ... Ling Lei
-
Hong Chen, et. al.Hong Chen ... Ling Lei
01 Jul 2019
01 Jul 2019

Hierarchical Morphology-Guided Tooth Instance Segmentation from CBCT Images
Zhiming Cui ... Chunfeng Lian
-
Zhiming Cui, et. al.Zhiming Cui ... Chunfeng Lian
01 Jan 2020
01 Jan 2020

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Scene-Aware Activity Program Generation with Language Guidance

Abstract

Talk to us

Similar Papers

More From: ACM Transactions on Graphics