Abstract

Network processors employ a multithreaded, chip-multiprocessing architecture to effectively hide memory latency and deliver high performance for packet processing applications. In such a parallel paradigm, when multiple threads modify a shared variable in the external memory, the threads should be properly synchronized such that the accesses to the shared variable are protected by critical sections. Therefore, in order to efficiently harness the performance potential of network processors, it is critical to hide the memory latency and synchronization latency in multi-threading and multiprocessing. In this paper, we present a novel program transformation used in the Intelreg Auto-partitioning C Compiler for IXP, which perform optimal placement of memory access instructions and synchronization instructions for effective latency hiding. Experimental results show that the transformation provides impressive speedup (up-to to 8.5x) and scalability (up- to 72 threads) of the performance for the real-world network application (a 10Gbps Ethernet Core/Metro Router).

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.