Abstract

This paper presents an efficient pipelined broadcasting algorithm with the inter-node transmission order change technique considering the communication status of processing nodes. The proposed method changes the transmission order for the broadcast operation based on the communication status of processing nodes. When a broadcast operation is received, a local bus checks the remaining pre-existing transmission data size of each processing node; it then transmits data according to the changed transmission order using the status information. Therefore, the synchronization time can be hidden for the remaining time, until the pre-existing data transmissions finish; as a result, the overall broadcast completion time is reduced. The simulation results indicated that the speed-up ratio of the proposed algorithm was up to 1.423, compared to that of the previous algorithm. To demonstrate physical implementation feasibility, the message passing engine (MPE) with the proposed broadcast algorithm was designed by using Verilog-HDL, which supports four processing nodes. The logic synthesis results with TSMC 0.18 μm process cell libraries show that the logic area of the proposed MPE is 2288.1 equivalent NAND gates, which is approximately 2.1% of the entire chip area. Therefore, performance improvement in multi-core processors is expected with a small hardware area overhead.

Highlights

  • Multi-core processor and many-core processor have become dominant processor models in many modern computer systems including smartphones, tablet PCs, desktop computers and even high-performance server systems [1,2]

  • The simulation was performed with the multi-core processor system and the bus functional models that support each broadcast algorithm

  • Since the sequential tree algorithm was presented, various broadcast algorithms were proposed; these include the binary tree, binomial tree, minimum spanning tree, distance minimum spanning tree algorithms, as well as Van de Gejin’s hybrid broadcast and modified hybrid broadcast algorithms. These algorithms offer performance efficiency by tuning network topologies in collective communications, but they cannot utilize the maximum bandwidth for message broadcast, given their restrictive structure

Read more

Summary

Introduction

Multi-core processor and many-core processor have become dominant processor models in many modern computer systems including smartphones, tablet PCs, desktop computers and even high-performance server systems [1,2]. An interconnection network multi-core processors transfers information from any source core to any desired destination core. This transfer should be completed with as small latency as possible. Multi-Core switches, which to send the information from the source coreinformation to the destination core.core The network. Anhelp interconnection network in multi-core processors transfers from any source to by anyits desired destination core. Is specified topology, routing switching strategy, and flowlatency control It should allow a large number of such transfers to take place it should be The leading companies in multi-core processors haveconcurrently. The types: general-purpose processors such asrouting those algorithm, in the I7 series, and high-performance network is specified by its topology, switching strategy, and flow controlcomputing processorsmechanism

Methods
Results
Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.