Abstract
Interconnection networks have been deployed as the communication fabric in a wide range of parallel computer systems. With recent technological trends allowing growing quantities of chip resources and faster clock rates, there have been prevailing concerns of increasing power consumption being a major limiting factor in the design of parallel computer systems, from multiprocessor SoCs to multi-chip embedded systems and parallel servers. To tackle this, power-aware networks must become inherent components of single-chip and multi-chip system.On the hardware design side, while there has been some recent interconnection network power reduction research, especially targeted towards communication links, the techniques presented are ad hoc and are not tailored to the application running on the network. We show that with these ad hoc techniques, power savings and corresponding impact on network latency vary significantly from one application to the next -- in many cases network performance can suffer severely. On the software side, extensive research on compile-time optimization has produced parallelizing compilers that can efficiently map an application onto hardware for high performance. However, research into power-aware parallelizing compilers is in its infancy; none addressed communication power.In this paper, we take the first steps towards tailoring applications' communication needs at run-time for low power. We propose software techniques that extend the flow of a parallelizing compiler in order to direct run-time network power optimization. We target network links, the dominant power consumer in these systems, allowing DVS instructions extracted during static compilation to orchestrate link voltage and frequency transitions for power savings during application runtime. Concurrently, a hardware online mechanism measures network congestion levels and adapts these off-line DVS settings to optimize network performance. Our simulations show that link power consumption can be greatly reduced by up to 76.3%, with a minor increase in network latency in the range of 0.23 to 6.78% across a number of benchmark suites running on three existing parallel architectures, from very fine-grained single-chip to coarse-grained multi-chip architectures.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.