Abstract

The scene graph is used to represent the semantics of images or visual understanding. It has been used frequently for image retrieval and image generation tasks. We develop a scene graph generator tool from a single image. This tool creates a scene graph representing Thai language. The main methodology contains 3 steps: image captioning, scene graph parser, and machine translation. We propose an application of chat bot demonstrating the use of the generated scene graph data. Our experimental results show the metric values of the machine translator and caption generator. For the translator model, we use BLEU, GLEU, WER and TER scores. Those scores are calculated during the different process in preparing the input data. In the first step, the translator model accepts a single word which is mapped to translated words in a local language, and we compose them into a sentence. In the second step, we send the sentence to the translator model. We also measure METEOR, ROUGE, CIDEr and SPICE scores for the performance of the caption generator. For the overall system score calculated by CIDEr and GLEU scores, we obtain 0.5393 and 0.6028 respectively.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call