Date of Award:

8-2025

Document Type:

Thesis

Degree Name:

Master of Science (MS)

Department:

Mathematics and Statistics

Committee Chair(s)

Alan Wisler

Committee

Alan Wisler

Committee

Brennan Bean

Committee

Kevin Moon

Abstract

Classification tasks are fundamental in statistical machine learning. In classification tasks, a general goal is to build or select a model that can correctly classify data with as few errors as possible. However, for a particular dataset, the minimal number of errors achievable is seldom zero since overlap in the data makes errors unavoidable. As a result, it is often difficult for machine learning practitioners and data scientists to know whether classification errors can be reduced through further refinement. A potential solution to this lies in the Bayes error rate (BER). The BER is the lowest error rate achievable for a given set of features. If known, the BER could give data scientists better ability to gauge the performance of specific models relative to the limitations of the data and thus make more educated decisions on how much time should be spent iterating on existing solutions. In general, the exact class distributions are unknown, so the BER cannot be determined exactly. Instead, research focuses on estimating or bounding the BER as closely as possible given the data. There are a wide variety of ways to try to bound the BER. This thesis discusses several of these methods and aims to characterize how well they can perform in different scenarios where the BER is known. In particular, we seek to quantify how often the true BER actually falls within the lower and upper bounds for the different methods in the literature. This characteristic has been neglected in the prior literature and would help establish the degree to which these bounds can actually be trusted as a reliable tool for classification problems.

Checksum

d7c4629627003d5309cb6bdc655b2770

Creative Commons License

Creative Commons Attribution 4.0 License
This work is licensed under a Creative Commons Attribution 4.0 License.

Share

COinS