Abstract
Natural language descriptions of user interface (UI) elements such as alternative text are crucial for accessibility and language-based interaction in general. Yet, these descriptions are constantly missing in mobile UIs. We propose widget captioning, a novel task for automatically generating language descriptions for UI elements from multimodal input including both the image and the structural representations of user interfaces. We collected a large-scale dataset for widget captioning with crowdsourcing. Our dataset contains 162,859 language phrases created by human workers for annotating 61,285 UI elements across 21,750 unique UI screens. We thoroughly analyze the dataset, and train and evaluate a set of deep model configurations to investigate how each feature modality as well as the choice of learning strategies impact the quality of predicted captions. The task formulation and the dataset as well as our benchmark models contribute a solid basis for this novel multimodal captioning task that connects language and user interfaces.
Highlights
Mobile apps come with a rich and diverse set of design styles, which are often more graphical and unconventional compared to traditional desktop applications
A novel task to automatically generate captions for user interface (UI) elements1 based on their visual appearance, structural properties and context
We propose widget captioning as a task for automatically generating language descriptions for UI elements in mobile user interfaces; The task raises unique challenges for modeling and extends the popular image captioning task to the user interface domain
Summary
Mobile apps come with a rich and diverse set of design styles, which are often more graphical and unconventional compared to traditional desktop applications. Language descriptions of user interface (UI) elements—that we refer to as widget captions— are a precondition for many aspects of mobile UI usability. Widget captions are an enabler for many language-based interaction capabilities Mobile UI. A significant portion of mobile apps today lack widget captions in their user interfaces, which have stood out as a primary issue for mobile accessibility (Ross et al, 2018, 2017). More than half of image-based elements have missing captions (Ross et al, 2018). Beyond image-based ones, our analysis of a UI corpus here showed that a wide range of elements have missing captions. Existing tools for examining and fixing missing captions (AccessibilityScanner, 2019; AndroidLint, 2019; Zhang et al, 2018, 2017; Choo et al, 2019) require developers to manually compose a language description for each element, which imposes a substantial overhead on developers
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.