Abstract
Many-core processors are becoming mainstream computing platforms nowadays. How to map the application threads to specific processing cores and exploit the abundant hardware parallelism of a many-core processor efficiently has become a pressing need. This work proposes a data affinity based threads grouping mapping strategy Data Affinity Grouping based Thread Mapping (DagTM), which categorizes threads into different groups according to their data affinity and the hardware architecture feature of many-core processors. After that, the thread groups are mapped to the specific processing cores to be energy efficiently executed. More specifically, first, the intra-thread data locality is analyzed by computing the data reuse distance, and the inter-thread data affinity is quantified by affinity matrix. Second, the threads are categorized into different groups via affinity subtree spanning algorithm. Finally, the thread groups are assigned to different processing cores through static binding. DagTM is able to reduce conflicts of the shared memory access and additional data transmission, increase utilization of the computing resources, and reduce entire system energy consumption. Experimental results show that DagTM obtains a nearly 14% improvement in computing performance, and a nearly 10% reduction in energy consumption compared with the traditional thread mapping mechanism under the condition of not introducing additional runtime overhead.
Highlights
Improving computing performance and reducing energy consumption remain key problems in the high-performance computing domain [1]
Experimental results show that Data Affinity Grouping based Thread Mapping (DagTM) improves the application performance by nearly 14%, and decreases energy consumption by 10% on average compared with the traditional Operating System (OS) thread mapping mechanism for PARSEC [4] benchmark programs running on an Intel many integrated core (MIC)
(4) We implement the thread grouping mapping strategy DagTM based on the data affinity on the
Summary
Improving computing performance and reducing energy consumption remain key problems in the high-performance computing domain [1]. In the emerging heterogeneous many-core systems composed of a host processor and co-processor, the host processor is used to deal with complex logical control tasks (i.e., task scheduling, task synchronizing, and data allocating), and the co-processor is used to compute large-scale parallel tasks with high computing density and simple logical branch. These two processors cooperate to compute different portions of a program to improve the program energy efficiency. All processing cores within each microprocessor share the same last level cache and operate at the same frequency
Published Version (Free)
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have