Abstract

AbstractWeb accessibility evaluation is a costly process that usually requires manual intervention. Currently, large language model (LLM) based systems have gained popularity and shown promising capabilities to perform tasks that seemed impossible or required programming knowledge specific to a given area or were supposed to be impossible to be performed automatically. Our research explores whether an LLM-based system would be able to evaluate web accessibility success criteria that require manual evaluation. Three specific success criteria of the Web Content Accessibility Guidelines (WCAG) that currently require manual checks were tested: 1.1.1 Non-text Content, 2.4.4 Link Purpose (In Context), and 3.1.2 Language of Parts. LLM-based scripts were developed to evaluate the test cases. Results were compared against current web accessibility evaluators. While automated accessibility evaluators were unable to reliably test the three WCAG criteria, often missing or only warning about issues, the LLM-based scripts successfully identified accessibility issues the tools missed, achieving overall 87.18% detection across the test cases. Conclusion The results demonstrate LLMs can augment automated accessibility testing to catch issues that pure software testing misses today. Further research should expand evaluation across more test cases and types of content.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call