Heuristic functions are an integral part of MapReduce software, both in Apache Hadoop and Spark. If the heuristic function performs badly, the load in the reduce part will not be balanced and access times spike. To investigate this problem closer, we run an optimal database program with numerous different heuristic functions on database. We will leverage the Amazon elastic MapReduce framework. The paper investigates on general purpose, implementation, and evaluation of heuristic algorithm for generating optimal database system, checksum, and special heuristic functions. With the analysis, we present the corresponding runtime results. For the coding part, the records counting part is hasty and can only work for local Hadoop part, it can be debugged and optimized for general purpose implement on Hadoop and Spark and turn into an effective performance monitor tool. As mentioned before, there are strange issue, also the performance of BLAKE2s is unexpectedly slow in that it’s widely accepted the performance of BLAKE2s is much better than MD5 and SHA256, we would like to figure out why the common-sense performance of heuristics is deferent from what we got in distributed frameworks.
Read full abstract