Abstract

Recently, language acquisition with aids of multi-modal information have drawn more and more attention. However, semantic grounding of verbs has been less concerned due to their complex semantic representation. This paper proposed a novel way to combine visual information into semantic representation of Chinese verb. While introducing original representation of two constituents, which are verb frame and argument from Frame Semantic, both of them are linked with visual information for verb semantic. And a visual information based categorization for arguments is mainly discussed. For achieving it, a collection of {video, its text description} pairs is first built. After preprocessing on both sides, the correspondence between arguments of verbs and related visual features is constructed basing on SOM groups. A video describing system has also been built to generate sentences for new videos. The evaluation of the describing system shows the effectiveness of our visual semantic representation on Chinese verbs.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call