Abstract

Most of the existing sorting implementations are manually optimized since the compilers are unable to generate optimized code, mainly due to unavailability of necessary information required at compile time. This information is only available during execution of the code. However, it can be exposed at compile time through specialization to facilitate the compiler for performing optimizations. This paper presents an automated approach using specialization to generate optimized code for sorting data on different architectures. The sorting kernel is iteratively specialized in a hierarchical way to generate an optimized version comprising a high-level kernel and three low-level kernels: insertion, base, and merge kernels. The high-level kernel working in conjunction with the low-level kernels is embedded into quick sort kernel to be invoked when the data fit within cache sizes. The experiments for our optimization approach have been performed on the Intel Core-2 Duo and Power 4 (PowerPC) processors using icc and gcc compilers, respectively. The sorting code optimized through hierarchical specialization results in fast execution and, in many cases, performs better than the manually optimized implementations.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.