Abstract
In this paper, we propose a novel scene retrieval and re-ranking method based on a text-to-image Generative Adversarial Network (GAN). The proposed method generates an image from an input query sentence based on the text-to-image GAN and then retrieves a scene that is the most similar to the generated image. By utilizing the image generated from the input query sentence as a query, we can control semantic information of the query image at the text level. Furthermore, we introduce a novel interactive re-ranking scheme to our retrieval method. Specifically, users can consider the importance of each word within the first input query sentence. Then the proposed method re-generates the query image that reflects the word importance provided by users. By updating the generated query image based on the word importance, it becomes feasible for users to revise retrieval results through this re-ranking process. In experiments, we showed that our retrieval method including the re-ranking scheme outperforms recently proposed retrieval methods.
Highlights
With the recent exponential growth of Web services such asYouTube1 and Netflix,2 the amount of video data has greatly increased
We introduce a novel interactive re-ranking scheme to the above-mentioned retrieval method based on the structure of the Generative Adversarial Network (GAN)
By comparing with Comparative Method 2 (CM 2), we can confirm that the proposed method can retrieve objective scenes considering the word relationships of an input query sentence
Summary
School of Information Science and Technology, Hokkaido University, Sapporo 060-0814, Japan of Information Science and Technology, Division of Media and Network Technologies, Hokkaido University, Sapporo 060-0814, Japan This work was supported in part by the MIC/SCOPE under Grant #181601001.
Published Version (Free)
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have