Research Week 2023

Reclaiming Fault Resilience and Energy Efficiency With Enhanced Performance in Low Power Architectures

Noel Daniel Gundi, Utah State University

Class

Article

College

College of Engineering

Department

Electrical and Computer Engineering Department

Faculty Mentor

Sanghamitra Roy

Presentation Type

Poster Presentation

Abstract

Shrinking technology node and the massive increase in data workloads has witnessed a swift migration of the system towards the Low-Power Computing (LPC) paradigm. Additionally, to accelerate the redundant yet mammoth AI instructions, novel ASIC design architectures have been explored. Google’s Tensor Processing Unit (TPU) is one such architectural innovation deployed in the commercial space to speedup the processing of AI workloads. In an effort to achieve a superior energy efficiency, Near-Threshold Computing (NTC) has been marginalized to be an efficient LPC paradigm. Due to an underscaling of voltage, NTC offers quadratic savings is power consumption in comparison to operating the system at its nominal counterpart i.e., Super-Threshold Computing (STC). However, NTC exhibits an extreme sensitivity to Process Variation (PV). Moreover, the reduced speed of transistors at NTC exacerbates the overall performance of the system. Hence, the integration of NTC into the conventional semiconductor workspace has been restricted. In this work, distinct methodologies are explored to provide improved performance at NTC. Furthermore, effects of PV, which are unnoticed at STC but posing a severe threat to the reliability of the low-power AI computing is addressed. This dissertation exploits the disparate computational delays of arithmetic units to provide up to 2.5× improved performance and 1.35× better energy efficiency at NTC. Additionally, the distinct dataflow patterns of the TPU are statistically analyzed to employ selective voltage levels and further enhance the performance of the TPU. Also, the homogeneous architecture of the TPU systolic array is thoroughly investigated to design a low-overhead faulty Processing Element (PE) detection scheme. The locality of the faulty PE is later utilized to tackle the impending faults.

Location

Logan, UT

Start Date

4-12-2023 2:30 PM

End Date

4-12-2023 3:30 PM

Download

Included in

Electrical and Computer Engineering Commons

COinS

Apr 12th, 2:30 PM Apr 12th, 3:30 PM

Reclaiming Fault Resilience and Energy Efficiency With Enhanced Performance in Low Power Architectures

Logan, UT

Research Week 2023

Reclaiming Fault Resilience and Energy Efficiency With Enhanced Performance in Low Power Architectures

Class

College

Department

Faculty Mentor

Presentation Type

Abstract

Location

Start Date

End Date

Included in

Browse

For Authors

Scholarly Communication

Research Data

Research Week 2023

Reclaiming Fault Resilience and Energy Efficiency With Enhanced Performance in Low Power Architectures

Presenter Information

Class

College

Department

Faculty Mentor

Presentation Type

Abstract

Location

Start Date

End Date

Included in

Share

Browse

For Authors

Scholarly Communication

Research Data