Abstract

The best mapping of a task to one or more processing units in a heterogeneous system depends on multiple variables. Several approaches based on runtime systems have been proposed that determine the best mapping under given circumstances automatically. Some of them also consider dynamic events like varying problem sizes or resource competition that may change the best mapping during application runtime but only a few even consider that task execution may fail. While aging or overheating are well-known causes for sudden faults, the ongoing miniaturization and the growing complexity of heterogeneous computing are expected to create further threats for successful application execution. However, if properly incorporated, heterogeneous systems also offer the opportunity to recover from different types of faults in hardware as well as in software. In this work, we propose a combination of both topics, dynamic performance-oriented task mapping and dependability, to leverage this opportunity. As we will show, this combination not only enables tolerating faults in hardware and software with minor assistance of the developer, it also provides benefits for application development itself and for application performance in case of faults due to a new metric and automatic data management.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call