Date of Award:


Document Type:


Degree Name:

Master of Science (MS)


Mathematics and Statistics

Committee Chair(s)

Alan Wisler


Alan Wisler


Yan Sun


Kevin Moon


Many real world problems require the prediction of ordinal variables where the values are a set of categories with an ordering to them. However, in many of these cases the categorical nature of the ordinal data is not a desirable outcome. As such, regression models treat ordinal variables as continuous and do not bind their predictions to discrete categories. Prior research has found that these models are capable of learning useful information between the discrete levels of the ordinal labels they are trained on, but complex models may learn ordinal labels too closely, missing the information between levels. In this study, we use several datasets where the outcome is continuous and generate ordinal labels from this variable. The performance of two types of models, namely ordinal classification and continuous regression, is examined to determine the effect of model complexity. The experiment confirms previous findings that regression models trained on the synthetic ordinal labels reach optimal performance on the continuous outcome with less complexity than compared with performance on the ordinal labels. Additionally, the former overfit more quickly as complexity increases. This suggests that for machine learning settings where we would like to form a continuous analog for ordinal training labels, models should be trained with less complexity than what appears optimal given the observed performance on the ordinal labels reaches optimization after the underlying ground truth continuous measure’s optimization occurs.