A Bottom-up Paradigm for Traffic Scene Graph Representation

Zhixuan Zhang,Yuanqi Su,Ping Li,Jinzi Zheng,Chi Zhang,Yuehu Liu

doi:10.1145/3436369.3437437

Abstract

With increasing hardware computing power and model capacity, visual tasks for scene cognitive understanding have attracted more attention, such as visual relationships inference. The scene graph representation formed by a coupling of objects, attributes and relationships nodes displayed by different modalities of information, including original image, foreground things, background stuff and scene attributes, strongly promotes the progress of research area. In this paper, we address the scene graph representation of traffic scenarios for autonomous driving. It should be noted that the universal representation are the specific needs of cognitive understanding of traffic scenes: on the one hand, there is a lack of fine-grained description of key objects and attributes; on the other hand, there are redundant descriptions of objects and relationships. To tackle these problems, we take advantage of the fine-grained instance-level annotation of the traffic scene, proposing a bottom-up representation paradigm. It makes full use of the hierarchical structure of the traffic scene and the sparsity of element classes. In addition, on the basis of the existing methods, we optimize the relationship list of traffic scene graph representation. Moreover, we improve the scene graph annotation methods, proposing a ground-vision joint location method to better describe the spatially-distributed visual knowledge. The case analysis showed that compared with existing methods, our paradigm for scene graph can represent more abundant traffic scene information.

Full Text