Data-centric metaprogramming in object-oriented languages

Vlad Ureche

doi:10.1145/2843915.2843916

Abstract

We've embraced the Growing a Language philosophy (Guy Steele, OOPSLA'98): Languages expose a small core while libraries add the domain-specific tasks, manage communication and scale computation to thousands of machines in a reliable fashion. It's both amazing and scary to see what can be achieved with a small core language and a liberal syntax: we're at a point where programmers use domain-specific languages without even realizing it, thanks to the perfect host language integration. Yet, there is a conflict in this approach: Although libraries open up domain-specific optimization opportunities, such as fusion, we don't have the means to exploit that. Being implemented in terms of the core language, both libraries and client programs are all compiled down to low-level intermediate representations, where even the smartest optimizers can't recover the high-level optimization opportunities. There's a fundamental mismatch - despite their good integration with the host language, domain-specific libraries simply don't have good optimization mechanisms. We are sorely missing a mechanism for library self-optimization, and even more, inter-library optimization. Unfortunately, the existing approaches, such as macros, multi-stage programming and partially evaluated interpreters (like Truffle) are difficult to use for mainstream programmers and make it almost impossible to express optimizations across different libraries. Missing this cross-library composition means each optimization has to make pessimistic assumptions about the code outside its control, reducing the number of possible optimizations for a library taken alone. For example, there's little one can do to to optimize an off-heap array in general, but if we know it stores pairs of integers and sparse matrices, there's suddenly an opportunity to reorganize the data and to make access more efficient. Therefore the most benefit can be gained from inter-library optimizations. In this presentation, I show my work on programmer-driven domain-specific transformations, which can work across different libraries. This approach can express high-level transformations in the host language in a very natural and self-contained way, making them amenable to distribution along with libraries or as separate artifacts. In turn, this allows the library users to mix and match both the libraries and transformations they need for the task at hand. Furthermore, the transformation is expressed based on the conversion methods, which allow triggering the transformation for both generic (e.g. for all vectors, regardless of the element type) and specific targets (e.g. only vectors of pairs of numeric types) -- the full type system expressivity is available to select the transformation target and to specify the transformation result. On the compiler side, the work focuses on correctly transforming programs, assuming an open-world and allowing object-oriented dynamic dispatch and overriding. There are many considerations to take into account. For example, when overriding methods change signatures due to the transformation, bridges need to be created in order to preserve the object model. The Scala implementation, made as a compiler plugin, relies on a strong type-flow-based data representation transformation mechanism (Late Data Layout) that currently powers specialization (through miniboxing), value class transformations and compiler support for multi-stage programming. The project implementation is open-source at https://github.com/miniboxing/ildl-plugin and the paper is available at https://infoscience.epfl.ch/record/207050. You can also read about the sample transformations, such as deforestation, array of struct to struct of array and localized optimistic specialization on the project wiki: https://github.com/miniboxing/ildl-plugin/wiki.

Full Text