Abstract

The past few decades have witnessed a wealth of promising work in making machines interpret the scenes around us. However, scene interpretation is still in its infancy, in comparison with human cognition. As such, human language, a highly developed output of human cognition, can be seen as an important cue towards scene interpretation. We survey in this paper Tower of Knowledge (ToK) approaches, which take advantage of human language, for scene interpretation. The core of ToK approaches is a multi-layer architecture, namely ToK architecture, aiming to establish the information flow of scene interpretation. In general, ToK architecture can be applied in scene interpretation by exploiting its either vertical or horizontal connections. First, we focus on the approaches with respect to the vertical connections in ToK architecture. In such approaches, the optimal label is assigned to each identified object in a scene, on the basis of verifying whether the object has the right characteristics to fulfil the functions a label implies. Second, we discuss the approaches on utilising the horizontal connections of ToK architecture to interpret a scene, according to the asymmetric spatial relationships of the objects. In retrospect of what has been achieved so far, we finally outlook what the future may hold for ToK.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call