Abstract
Programming is a vital skill in computer science and engineering-related disciplines. However, developing source code is an error-prone task. Logical errors in code are particularly hard to identify for both students and professionals, and a single error is unexpected to end-users. At present, conventional compilers have difficulty identifying many of the errors (especially logical errors) that can occur in code. To mitigate this problem, we propose a language model for evaluating source codes using a bidirectional long short-term memory (BiLSTM) neural network. We trained the BiLSTM model with a large number of source codes with tuning various hyperparameters. We then used the model to evaluate incorrect code and assessed the model’s performance in three principal areas: source code error detection, suggestions for incorrect code repair, and erroneous code classification. Experimental results showed that the proposed BiLSTM model achieved 50.88% correctness in identifying errors and providing suggestions. Moreover, the model achieved an F-score of approximately 97%, outperforming other state-of-the-art models (recurrent neural networks (RNNs) and long short-term memory (LSTM)).
Highlights
Programming is among the most critical skills in the field of computing and software engineering
A series of experiments based on the source codes collected from Aizu Online Judge (AOJ) was conducted using greatest common divider (GCD) and insertion sort (IS) problem codes
It is generally recognized that conventional compilers and other code evaluation systems are unable to reliably detect logic errors and provide proper suggestions for code repair
Summary
Developing source code is an error-prone task. Conventional compilers have difficulty identifying many of the errors (especially logical errors) that can occur in code. To mitigate this problem, we propose a language model for evaluating source codes using a bidirectional long short-term memory (BiLSTM) neural network. We trained the BiLSTM model with a large number of source codes with tuning various hyperparameters. We used the model to evaluate incorrect code and assessed the model’s performance in three principal areas: source code error detection, suggestions for incorrect code repair, and erroneous code classification. The model achieved an Fscore of approximately 97%, outperforming other state-of-the-art models (recurrent neural networks (RNNs) and long short-term memory (LSTM)).
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have