Abstract

Recently proposed pipelined multithreading (PMT) techniques have shown great applicability to parallelizing general programs on multi-core processors. However, the potential performance of these techniques is limited by the large inter-core communication overheads which become a performance bottleneck. This paper addresses this problem and presents a novel clustered pipelined multithreading (CPMT) technique that can construct efficient pipeline parallelism on commodity multi-core processors. This technique combines a clustered communication mechanism that can greatly reduce average communication overheads (ACOs) in software only approach. We quantitatively demonstrate the performance of CPMT can be improved through reducing the ACOs and show the performance characteristics. Moreover, we also give the stage decomposition procedure and provide a stage execution framework that can execute the multiple stages within one procedure. The effectiveness of CPMT technique has been evaluated on the commodity AMD Phenom four-core processors. Experimental results show that our CPMT technique achieves speedup ranging from 116.8% to 219.8% on some typical loops extracted from SPEC CPU 2000 benchmark programs.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.