AbstractRecently, big data specific technologies have been emerging, including domain‐specific languages, software frameworks, databases, third‐party libraries, and so forth. These techniques are successful in concealing the low‐level details by producing high‐level code, which is passed through the conventional compilation cycle for generating hardware operable code. Several optimization opportunities exist in the compiler which can assist in meeting the processing deadlines of big data workloads, through optimized machine code. However, the existing iterative compilation techniques are not enough for the exploration of big data applications. In this regard, a novel engine has been presented for exploiting the compiler optimization space of big data workloads. The engine is comprised of training and testing phases. During the training stage, the big data application is optimized with Mitigates the Compiler Phase‐ordering (MiCOMP) and genetic algorithm (GA) optimization sequences, which are executed with train datasets. In the testing stage, the test datasets are executed only for the best 300 optimization sequences discovered at the training stage. The proposed engine has been tested with graph mining, machine learning, and text search categories of big data applications using a wide range of real‐world and synthetic datasets. Overall, the engine is , , and faster than Iterative Optimization for the Data Center (IODC), MiCOMP, and GA respectively in exploiting the compiler search space for big data workloads. Further, the integration of best‐10 and best‐3 techniques with the engine brings a speedup of and . The compiler level exploitation of general‐purpose machines incurs no extra overhead, no heavy computing, and no personnel cost. Also, the overall performance of big data specialized software solutions can be enhanced by compiling their high‐level code with suitable compiler optimizations.
Read full abstract