Abstract

Fine-grain data parallelism is increasingly common in mainstream processors in the form of long vectors and on-chip GPUs. This paper develops compiler and runtime support to exploit such data parallelism for non-numeric, non-graphic, irregular parallel tasks that perform simple computations while traversing many independent, irregular data structures, like trees and graphs. We vectorize the traversal of trees and graphs by treating a set of irregular data structures as a parallel control-flow graph and compiling the traversal into a domain-specific bytecodes. We produce a SIMD interpreter for these bytecodes, so each lane of a SIMD unit traverses one irregular data structure. Despite the overhead of interpretation, we demonstrate significant increases in single-core performance over optimized baselines.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.