Vectorized execution engines process large datasets by decomposing computations into concise (tight) loops, which can be more efficiently executed by modern hardware. Providing loops that are optimal for execution usually adds burden to the software development process, as developers are required to understand details of vectorized execution, columnar data layout, data encodings, and the code compilation process itself, presenting a steep learning curve and challenges to organizations building and scaling large engineering teams. Due to their large quantity, scalar function authoring accentuates this problem. In our experience building the Velox open source execution engine, we have observed that exposing a large number of developers to the complexity inherent to vectorization resulted in a disproportionate amount of bugs and performance inefficiencies. In this paper, we describe the simple function interface (SFI) created to address this issue. SFI highly simplifies scalar function authoring by encapsulating the vectorization complexity required to generate tight loops, and presenting developers with a simpler, conciser, and more natural row-based interface - without sacrificing performance. SFI also hides columnar layout details, while providing developers the flexibility to efficiently implement advanced features such as functions with nested and recursive parameter types, type variables, variadic parameters, and generic types. Today, more than a thousand functions have been added to Velox using the SFI, implementing popular open source SQL dialects and internal domain-specific use cases at Meta, and are in active production use. While this paper presents implementation details, performance pitfalls, experimental results, and our overall experience developing the state-of-the-art Velox vectorized execution engine, we believe the concepts and trade-offs to be fundamentally equivalent and generally applicable to other vectorized engines.
Read full abstract