Exploring Uncertainty Measures for Image-caption Embedding-and-retrieval Task

Kenta Hama,Kuniaki Uehara,Takashi Matsubara,Jianfei Cai

doi:10.1145/3425663

Abstract

With the significant development of black-box machine learning algorithms, particularly deep neural networks, the practical demand for reliability assessment is rapidly increasing. On the basis of the concept that “Bayesian deep learning knows what it does not know,” the uncertainty of deep neural network outputs has been investigated as a reliability measure for classification and regression tasks. By considering an embedding task as a regression task, several existing studies have quantified the uncertainty of embedded features and improved the retrieval performance of cutting-edge models by model averaging. However, in image-caption embedding-and-retrieval tasks, well-known samples are not always easy to retrieve. This study shows that the existing method has poor performance in reliability assessment and investigates another aspect of image-caption embedding-and-retrieval tasks. We propose posterior uncertainty by considering the retrieval task as a classification task, which can accurately assess the reliability of retrieval results. The consistent performance of the two uncertainty measures is observed with different datasets (MS-COCO and Flickr30k), different deep-learning architectures (dropout and batch normalization), and different similarity functions. To the best of our knowledge, this is the first study to perform a reliability assessment on image-caption embedding-and-retrieval tasks.

Full Text