Date of Award:
8-2025
Document Type:
Thesis
Degree Name:
Master of Science (MS)
Department:
Mathematics and Statistics
Committee Chair(s)
Alan Wisler
Committee
Alan Wisler
Committee
Brennan Bean
Committee
Kevin Moon
Abstract
Classification tasks are fundamental in statistical machine learning. In classification tasks, a general goal is to build or select a model that can correctly classify data with as few errors as possible. However, for a particular dataset, the minimal number of errors achievable is seldom zero since overlap in the data makes errors unavoidable. As a result, it is often difficult for machine learning practitioners and data scientists to know whether classification errors can be reduced through further refinement. A potential solution to this lies in the Bayes error rate (BER). The BER is the lowest error rate achievable for a given set of features. If known, the BER could give data scientists better ability to gauge the performance of specific models relative to the limitations of the data and thus make more educated decisions on how much time should be spent iterating on existing solutions. In general, the exact class distributions are unknown, so the BER cannot be determined exactly. Instead, research focuses on estimating or bounding the BER as closely as possible given the data. There are a wide variety of ways to try to bound the BER. This thesis discusses several of these methods and aims to characterize how well they can perform in different scenarios where the BER is known. In particular, we seek to quantify how often the true BER actually falls within the lower and upper bounds for the different methods in the literature. This characteristic has been neglected in the prior literature and would help establish the degree to which these bounds can actually be trusted as a reliable tool for classification problems.
Checksum
d7c4629627003d5309cb6bdc655b2770
Creative Commons License
This work is licensed under a Creative Commons Attribution 4.0 License.
Recommended Citation
May, Riley, "Empirical Evaluation of Bayes Error Rate Bounds in Binary Classification" (2025). All Graduate Theses and Dissertations, Fall 2023 to Present. 522.
https://digitalcommons.usu.edu/etd2023/522
Included in
Copyright for this work is retained by the student. If you have any questions regarding the inclusion of this work in the Digital Commons, please email us at .