Uncertainty in Visual Generative AI

Kara Combs,Trevor J Bihl,Adam Moyer

doi:10.3390/a17040136

Abstract

Recently, generative artificial intelligence (GAI) has impressed the world with its ability to create text, images, and videos. However, there are still areas in which GAI produces undesirable or unintended results due to being “uncertain”. Before wider use of AI-generated content, it is important to identify concepts where GAI is uncertain to ensure the usage thereof is ethical and to direct efforts for improvement. This study proposes a general pipeline to automatically quantify uncertainty within GAI. To measure uncertainty, the textual prompt to a text-to-image model is compared to captions supplied by four image-to-text models (GIT, BLIP, BLIP-2, and InstructBLIP). Its evaluation is based on machine translation metrics (BLEU, ROUGE, METEOR, and SPICE) and word embedding’s cosine similarity (Word2Vec, GloVe, FastText, DistilRoBERTa, MiniLM-6, and MiniLM-12). The generative AI models performed consistently across the metrics; however, the vector space models yielded the highest average similarity, close to 80%, which suggests more ideal and “certain” results. Suggested future work includes identifying metrics that best align with a human baseline to ensure quality and consideration for more GAI models. The work within can be used to automatically identify concepts in which GAI is “uncertain” to drive research aimed at increasing confidence in these areas.

Full Text