Abstract

In recent years, millions of source codes are generated in different languages on a daily basis all over the world. A deep neural network-based intelligent support model for source code completion would be a great advantage in software engineering and programming education fields. Vast numbers of syntax, logical, and other critical errors that cannot be detected by normal compilers continue to exist in source codes, and the development of an intelligent evaluation methodology that does not rely on manual compilation has become essential. Even experienced programmers often find it necessary to analyze an entire program in order to find a single error and are thus being forced to waste valuable time debugging their source codes. With this point in mind, we proposed an intelligent model that is based on long short-term memory (LSTM) and combined it with an attention mechanism for source code completion. Thus, the proposed model can detect source code errors with locations and then predict the correct words. In addition, the proposed model can classify the source codes as to whether they are erroneous or not. We trained our proposed model using the source code and then evaluated the performance. All of the data used in our experiments were extracted from Aizu Online Judge (AOJ) system. The experimental results obtained show that the accuracy in terms of error detection and prediction of our proposed model approximately is 62% and source code classification accuracy is approximately 96% which outperformed a standard LSTM and other state-of-the-art models. Moreover, in comparison to state-of-the-art models, our proposed model achieved an interesting level of success in terms of error detection, prediction, and classification when applied to long source code sequences. Overall, these experimental results indicate the usefulness of our proposed model in software engineering and programming education arena.

Highlights

  • Programming is one of mankind’s most creative and effective endeavors, and vast numbers of studies have been dedicated to improving the modeling and understanding of software code [1]. e outcomes of many such studies are supporting a wide variety of core software engineering (SE) purposes, such as error detection, error prediction, error location identification, snippet suggestions, code patch generation, developer modeling, and source code classification [1]

  • Different n-gram models such as bigram, trigram, skip-gram [4], and GloVe [5] are all statistical language models that are very useful in language modeling tasks. is burgeoning usage has stimulated the availability of a large text corpus and is helping natural language processing (NLP) techniques to become more effective on a day-by-day basis. e NLP language model is not effective when used in complex SE endeavors but still useful for the intuitive language model

  • Our proposed model is expected to be effective in providing end-to-end solutions for programming learners and professionals in the SE fields. e experimental results obtained in this study show that the accuracy of error detection and prediction using our proposed long short-term memory (LSTM)-AM model is approximately 62%, whereas standard LSTM model accuracy is approximately 31%

Read more

Summary

Introduction

Programming is one of mankind’s most creative and effective endeavors, and vast numbers of studies have been dedicated to improving the modeling and understanding of software code [1]. e outcomes of many such studies are supporting a wide variety of core software engineering (SE) purposes, such as error detection, error prediction, error location identification, snippet suggestions, code patch generation, developer modeling, and source code classification [1]. Another study showed that the recurrent neural network (RNN) model, which is capable of retaining longer source code context than traditional n-gram and other language models, has achieved mentionable success in language modeling [7]. The RNN model faces limitations when it comes to representing the context of longer source codes due to gradient vanishing or exploding [8], which makes it hard to be trained using long dependent source code sequences As a result, it is only effective for a small corpus of source codes. An important research field that has recently emerged involves the use of AI systems for source code completion during development rather than manual compiling processes. RNN and LSTM language models were trained and the obtained results showed that the LSTM model performed better than the RNN model. at study used a Java project corpus to evaluate the performance of the language models

Objectives
Results
Discussion
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call