EditScribe: Non-Visual Image Editing with Natural Language Verification Loops

  • Abstract
  • Literature Map
  • Similar Papers
Abstract
Translate article icon Translate Article Star icon

Image editing is an iterative process that requires precise visual evaluation\nand manipulation for the output to match the editing intent. However, current\nimage editing tools do not provide accessible interaction nor sufficient\nfeedback for blind and low vision individuals to achieve this level of control.\nTo address this, we developed EditScribe, a prototype system that makes image\nediting accessible using natural language verification loops powered by large\nmultimodal models. Using EditScribe, the user first comprehends the image\ncontent through initial general and object descriptions, then specifies edit\nactions using open-ended natural language prompts. EditScribe performs the\nimage edit, and provides four types of verification feedback for the user to\nverify the performed edit, including a summary of visual changes, AI judgement,\nand updated general and object descriptions. The user can ask follow-up\nquestions to clarify and probe into the edits or verification feedback, before\nperforming another edit. In a study with ten blind or low-vision users, we\nfound that EditScribe supported participants to perform and verify image edit\nactions non-visually. We observed different prompting strategies from\nparticipants, and their perceptions on the various types of verification\nfeedback. Finally, we discuss the implications of leveraging natural language\nverification loops to make visual authoring non-visually accessible.\n

Save Icon
Up Arrow
Open/Close