Fault Tolerant Hash Join for Distributed Systems

Arsen Nasibullin

doi:10.32603/2071-2340-2022-4-68-82

Arsen Nasibullin

Open Access

PDF Available

https://doi.org/10.32603/2071-2340-2022-4-68-82

Copy DOI

Export

Save

Cite

Journal: Computer tools in education	Publication Date: Dec 28, 2022
License type: cc-by

Abstract
Full-Text PDF
Similar Papers

Abstract

Listen

Nowadays, enterprises are inclined to deploy data processing and analytical applications from well-equipped mainframes with highly available hardware components to commodity computers. Commodity machines are less reliable than expensive mainframes. Applications deployed on commodity clusters have to deal with failures that occur frequently. Mostly, these applications perform complex client queries with aggregation and join operations. The longer a query executes, the more it suffers from failures. It causes the entire work has to be re-executed. This paper presents a fault tolerant hash join (FTHJ) algorithm for distributed systems implemented in Apache Ignite. The FTHJ achieves fault tolerance by using a data replication mechanism, materializing intermediate computations. To evaluate FTHJ, we implemented the baseline, unreliable hash join algorithm. Experimental results show that FTHJ takes at least 30% less time to recover and complete join operation when a failure occurs during the execution. This paper describes how we reached a compromise between executing recovery tasks for the least amount of time and using additional resources.

Full Text