Date of Award:

5-2023

Document Type:

Thesis

Degree Name:

Master of Science (MS)

Department:

Mathematics and Statistics

Committee Chair(s)

Alan Wisler

Committee

Alan Wisler

Committee

Yan Sun

Committee

Kevin Moon

Abstract

Many real world problems require the prediction of ordinal variables where the values are a set of categories with an ordering to them. However, in many of these cases the categorical nature of the ordinal data is not a desirable outcome. As such, regression models treat ordinal variables as continuous and do not bind their predictions to discrete categories. Prior research has found that these models are capable of learning useful information between the discrete levels of the ordinal labels they are trained on, but complex models may learn ordinal labels too closely, missing the information between levels. In this study, we use several datasets where the outcome is continuous and generate ordinal labels from this variable. The performance of two types of models, namely ordinal classification and continuous regression, is examined to determine the effect of model complexity. The experiment confirms previous findings that regression models trained on the synthetic ordinal labels reach optimal performance on the continuous outcome with less complexity than compared with performance on the ordinal labels. Additionally, the former overfit more quickly as complexity increases. This suggests that for machine learning settings where we would like to form a continuous analog for ordinal training labels, models should be trained with less complexity than what appears optimal given the observed performance on the ordinal labels reaches optimization after the underlying ground truth continuous measure’s optimization occurs.

Checksum

ad2f9ec6498cd173330399d1c27da83b

Share

COinS