Modern software engineers use many tools to speed up the development process. Many of them use integrated development environments (IDEs), which provide services such as text editors, debuggers and even intelligent code completion. This paper is dedicated to the development of a model for predicting variants of program source code termination. To improve the accuracy of the model, we used combinations of Markov chains constructed using different ways of calculating the current context of the program: linear and with AST. The linear way of computing the context is an analysis of the tokenized representation of the source code. The second method, on the other hand, uses a representation of the source code in the form of an abstract syntax tree. Combining the different models preserves more semantic information about the code, also adding the ability to support custom code writing style features. In order to compare the different models, a new dataset has been created specifically for the Pascal language. A detailed comparison of the working mechanisms as well as the prediction accuracy on the collected data is given. The proposed model showed high enough accuracy of predictions with minimal computation costs, which allows using it in integrated development environments.
Read full abstract