Date of Award:
Master of Science (MS)
Electrical and Computer Engineering
Recent increases in hard fault rates in modern chip multi-processors have led to a variety of approaches to try and save manufacturing yield. Among these are: fine-grain fault tolerance (such as error correction coding, redundant cache lines, and redundant functional units), and large-grain fault tolerance (such as disabling of faulty cores, adding extra cores, and core salvaging techniques). This paper considers the case of core salvaging techniques and the heterogeneous performance introduced when these techniques have some salvaged and some non-faulty cores. It proposes a hypervisor-based hardware thread scheduler, triggered by detection of spin locks and thread imbalance, that mitigates the loss of throughput resulting from this het- erogeneity. Specifically, a new algorithm, called Most ProgressMade algorithm, reduces the number of synchronization locks held on a salvaged core and balances the time each thread in an application spends running on that core. For some benchmarks, the results show as much as a 2.68x increase in performance over a salvaged chip multi-processor without this technique.
Dutson, Jacob J., "Most Progress Made Algorithm: Combating Synchronization Induced Performance Loss on Salvaged Chip Multi-Processors" (2013). All Graduate Theses and Dissertations. 1962.
Copyright for this work is retained by the student. If you have any questions regarding the inclusion of this work in the Digital Commons, please email us at .