Tackling QoS-Induced Aging in Exascale Systems through Agile Path Selection
CODES '14: Proceedings of the 2014 International Conference on Hardware/Software Codesign and System Synthesis
Association for Computing Machinery
Division of Computer and Network Systems, National Science Foundation, Division of Computing and Communication Foundations
Network-On-Chips (NoCs) have become the standard communication platform for future massively parallel systems due to their performance, flexibility and scalability advantages. However, reliability issues brought about by scaling in the sub-20nm era threaten to undermine the benefits offered by NoCs. In this paper, we showthat QoS policies exacerbate the reliability profile of an exascale system. To mitigate this imposing challenge, we propose Dynamic Wearout Resilient Routing (DWRR) algorithms in QoS-enabled exascale NoCs. Our proposal includes two novel DWRR algorithms enabled by a critical-path monitor and a broadcast-based routing configuration. Using PARSEC benchmarks, our best algorithm improves QoS and long-term sustainability (Mean Time To Failure) of the system by an average of 16% and 25% compared to a state-of-the-art fault tolerant technique, respectively. Copyright 2014 ACM.
Dean Michael Ancajas, Koushik Chakraborty, Sanghamitra Roy and Jason Allred, Tackling QoS-induced Aging in Exascale Systems through Agile Path Selection. ACM International Conference on Hardware/Software Codesign and System Synthesis (CODES-ISSS), pp. -0, October 2014, New Delhi, India.