The existence of a wide variety of computing devices with very different properties makes essential the development of software that is not only portable among them, but which also adapts to the properties of each platform. In this paper, we present the Heterogeneous Butterfly Processing Library (HBPL), which provides optimized portable kernels for problems of small sizes that allow using orthogonal transform algorithms such as the FFT and DCT on different accelerators and regular CPUs. Our library is implemented on the OpenCL standard, which provides portability on a large number of platforms. Furthermore, high performance is achieved on a wide range of devices by exploiting run-time code generation and metaprogramming guided by a parametrization strategy. An exhaustive evaluation on different platforms shows that our proposal obtains competitive or better performance than related libraries.