Abstract

Statistical language models (LMs) have been applied in several software engineering applications. However, they have issues in dealing with ambiguities in the names of program and API elements (classes and method calls). In this paper, inspired by the success of Deep Neural Network (DNN) in natural language processing, we present Dnn4C, a DNN language model that complements the local context of lexical code elements with both syntactic and type contexts. We designed a context-incorporating method to use with syntactic and type annotations for source code in order to learn to distinguish the lexical tokens in different syntactic and type contexts. Our empirical evaluation on code completion for real-world projects shows that Dnn4C relatively improves 11.6%, 16.3%, 27.1%, and 44.7% top-1 accuracy over the state-of-the-art language models for source code used with the same features: RNN LM, DNN LM, SLAMC, and n-gram LM, respectively. For another application, we showed that Dnn4C helps improve accuracy over n-gram LM in migrating source code from Java to C# with a machine translation model.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call