Abstract

With the advent of the big data era, highly efficient and scalable join algorithms are becoming increasingly essential for database operations. As a result, recent years witnessed a strong momentum in accelerating join algorithms with multi- and many-core processors. Among various acceleration platforms, GPUs have the advantage in terms of raw computing power and scalability. The hash join problem, however, poses unique challenges for effective GPU implementations. Especially, a complete treatment of the problem by systematically considering various GPU architectural details and input characteristics is still missing. In this work, we built a GPU-based testbed to systematically study the performance tradeoffs of developing highly efficient GPU implementations for hash join. On such a basis, we investigated a set of essential building blocks including data transfer mechanisms between host (CPU) and device (GPU) to take advantage of the PCI-E bandwidth, a streaming scheme to effectively overlap data transfer and kernel execution, and an atomic-free transformation to minimize costly synchronization overhead. By integrating these blocks, we are able to improve the hash join performance to a new level. The experimental results show that our GPU implementation of hash join outperforms the state-of-the-art results by up to 111%. We also proposed a framework to guide the selection of optimization strategies.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call