CAPTCHAs (completely automated public Turing tests to tell computers and humans apart) have been the main protection against malicious attacks on public systems for many years. Audio CAPTCHAs, as one of the most important CAPTCHA forms, provide an effective test for visually impaired users. However, in recent years, most of the existing audio CAPTCHAs have been successfully attacked by machine learning-based audio recognition algorithms, showing their insecurity. In this paper, a generative adversarial network (GAN)-based method is proposed to generate adversarial audio CAPTCHAs. This method is implemented by using a generator to synthesize noise, a discriminator to make it similar to the target and a threshold function to limit the size of the perturbation; then, the synthetic perturbation is combined with the original audio to generate the adversarial audio CAPTCHA. The experimental results demonstrate that the addition of adversarial examples can greatly reduce the recognition accuracy of automatic models and improve the robustness of different types of audio CAPTCHAs. We also explore ensemble learning strategies to improve the transferability of the proposed adversarial audio CAPTCHA methods. To investigate the effect of adversarial CAPTCHAs on human users, a user study is also conducted.