Class
Article
College
College of Engineering
Department
Electrical and Computer Engineering Department
Faculty Mentor
Sanghamitra Roy
Presentation Type
Poster Presentation
Abstract
Shrinking technology node and the massive increase in data workloads has witnessed a swift migration of the system towards the Low-Power Computing (LPC) paradigm. Additionally, to accelerate the redundant yet mammoth AI instructions, novel ASIC design architectures have been explored. Google’s Tensor Processing Unit (TPU) is one such architectural innovation deployed in the commercial space to speedup the processing of AI workloads. In an effort to achieve a superior energy efficiency, Near-Threshold Computing (NTC) has been marginalized to be an efficient LPC paradigm. Due to an underscaling of voltage, NTC offers quadratic savings is power consumption in comparison to operating the system at its nominal counterpart i.e., Super-Threshold Computing (STC). However, NTC exhibits an extreme sensitivity to Process Variation (PV). Moreover, the reduced speed of transistors at NTC exacerbates the overall performance of the system. Hence, the integration of NTC into the conventional semiconductor workspace has been restricted. In this work, distinct methodologies are explored to provide improved performance at NTC. Furthermore, effects of PV, which are unnoticed at STC but posing a severe threat to the reliability of the low-power AI computing is addressed. This dissertation exploits the disparate computational delays of arithmetic units to provide up to 2.5× improved performance and 1.35× better energy efficiency at NTC. Additionally, the distinct dataflow patterns of the TPU are statistically analyzed to employ selective voltage levels and further enhance the performance of the TPU. Also, the homogeneous architecture of the TPU systolic array is thoroughly investigated to design a low-overhead faulty Processing Element (PE) detection scheme. The locality of the faulty PE is later utilized to tackle the impending faults.
Location
Logan, UT
Start Date
4-12-2023 2:30 PM
End Date
4-12-2023 3:30 PM
Included in
Reclaiming Fault Resilience and Energy Efficiency With Enhanced Performance in Low Power Architectures
Logan, UT
Shrinking technology node and the massive increase in data workloads has witnessed a swift migration of the system towards the Low-Power Computing (LPC) paradigm. Additionally, to accelerate the redundant yet mammoth AI instructions, novel ASIC design architectures have been explored. Google’s Tensor Processing Unit (TPU) is one such architectural innovation deployed in the commercial space to speedup the processing of AI workloads. In an effort to achieve a superior energy efficiency, Near-Threshold Computing (NTC) has been marginalized to be an efficient LPC paradigm. Due to an underscaling of voltage, NTC offers quadratic savings is power consumption in comparison to operating the system at its nominal counterpart i.e., Super-Threshold Computing (STC). However, NTC exhibits an extreme sensitivity to Process Variation (PV). Moreover, the reduced speed of transistors at NTC exacerbates the overall performance of the system. Hence, the integration of NTC into the conventional semiconductor workspace has been restricted. In this work, distinct methodologies are explored to provide improved performance at NTC. Furthermore, effects of PV, which are unnoticed at STC but posing a severe threat to the reliability of the low-power AI computing is addressed. This dissertation exploits the disparate computational delays of arithmetic units to provide up to 2.5× improved performance and 1.35× better energy efficiency at NTC. Additionally, the distinct dataflow patterns of the TPU are statistically analyzed to employ selective voltage levels and further enhance the performance of the TPU. Also, the homogeneous architecture of the TPU systolic array is thoroughly investigated to design a low-overhead faulty Processing Element (PE) detection scheme. The locality of the faulty PE is later utilized to tackle the impending faults.