Program representations for predictive compilation: State of affairs in the early 20’s

Anderson Faustino Da Silva,Otávio Oliveira Napoli,Nilton Luiz Queiroz,Fernando Magno Quintão Pereira,Edson Borin

doi:10.1016/j.cola.2022.101171

Abstract

In the last five years, predictive compilation has advanced with long strides. Contributions in the field include new program embeddings, new learning architectures, and datasets with millions of programs. This paper evaluates 25 state-of-the-art program embeddings, three of them new, plus two learning models from previous work. We have trained this apparatus with three large datasets, and have applied it onto three classification problems. When classifying programs according to the problem that they solve, we reproduced the high-accuracy results seen in previous work. However, we have not been able to repeat these results in the two new classification challenges that we study: namely, determining the depth of the most nested loop in a program and determining the best sequence of optimizations to reduce code size of programs. Negative results emerged, even in spite of the large number of classifiers, 25, that we have evaluated. Surprisingly, using the histogram of instruction opcodes, a very simple program embedding, led to about the same classification accuracy than embeddings like Ir2Vec or Inst2Vec , which were designed to solve stochastic compilation tasks.

Full Text