Abstract

The development of a question answering (QA) system for application programming interface (API) documentation can greatly facilitate developers in API-related tasks. However, when applying deep learning technology, API QA systems suffer from the spurious solution problem. That is, the answer can literally appear in multiple positions (i.e. start-end indices) in the API documentation, though only one of them (called golden solution) correctly solves the question given its context. The other incorrect candidates (called spurious solutions) hinder the neural network model to learn reasonable solutions or correct answers. In this work, we propose Clean-and-Learn, an effective and robust method for API QA over documents. In order to reduce the spuriousness of candidate solutions used for training, we design several scoring functions to rank the candidate occurrences (clean). Only high-quality (top-[Formula: see text]) candidate solutions are involved in training. Then, we perform multi-task learning by weighing the losses computed from the top-k occurrences (learn). We evaluate our method on the constructed APIQASet dataset. The experiment results show that Clean-and-Learn achieves a ROUGE-L score of 75.8 and accuracy of 70.5% in API QA, which significantly outperforms state-of-the-art approaches.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call