The accessibility of mobile apps takes into account the rights and interests of various social groups, which is vital for the millions of smartphone users who are visually impaired given the variety of mobile applications available on Google Play and the App Store. Most application icons, however, lack natural language labels. It is challenging for these users to engage with mobile phones utilizing screen readers featured in mobile operating systems. Millions of visually impaired smartphone Internet users’ inability to communicate with mobile applications have become the socio-cyber world’s dark side. COALA is a pilot work that solves this issue by generating the textual label from the imaging icon automatically. However, most icon datasets have imbalance distributions in the real-world scenario that only a few categories have rich-resource labeled samples, and the major rest categories have very limited samples. To address the data imbalance problem in the icon label generation task, we provide an interconnected two-stream language model with mean teacher learning, which learns a generalized feature representation from divergent data distributions. Extensive experiments demonstrate the superiority of our two-stream language model over previous single-language models on different low-resource datasets. More experimental results reveal that our method outperforms the COALA model by a wide margin in decreasing the dark side of the socio-cyber world.
Read full abstract