Abstract

This paper presents a parallelization framework for emerging applications on the future chip multiprocessors (CMPs). With the continuing prevalence of CMP and the number of on-die cores increasing steadily for the foreseeable future, one key issue in harnessing the computation power of such a CMP is how to effectively manage and execute many threads at the same time. Hence, we study a parallelization framework, which includes (1) coarse-grain and fine-grain multi-threading, (2) performance analysis, and (3) algorithms changes. In particular, this paper shows how the Hough Transform can be parallelized, as an example. Starting with a sports soccer analysis workload that heavily uses Hough Transform to detect lines in sports soccer field, we extract the coarse-grain data-level parallelism and examine its scaling performance on an 8-core symmetric multiprocessor machine. After realizing the parallel performance limiting factors, we target to exploit the fine-grain data-level parallelism and evaluate its speedup on the 8-core machine and a simulated 64-core CMP. Due to parallel overhead and demanding memory requirements, this fine-grain parallelization doesn't contribute significant performance improvement. After that, we propose a new Hough Transform, and parallelize it in a fine-grain way. Experimental data shows that the new Hough Transform exposes a significant amount of concurrency and pretty good data locality. On the simulated 64-core CMP, we achieve parallel scaling of 61.7x, enabling real-time Hough Transform.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call