Date of Award:
Master of Science (MS)
Mathematics and Statistics
Many real world problems require the prediction of ordinal variables where the values are a set of categories with an ordering to them. However, in many of these cases the categorical nature of the ordinal data is not a desirable outcome. As such, regression models treat ordinal variables as continuous and do not bind their predictions to discrete categories. Prior research has found that these models are capable of learning useful information between the discrete levels of the ordinal labels they are trained on, but complex models may learn ordinal labels too closely, missing the information between levels. In this study, we use several datasets where the outcome is continuous and generate ordinal labels from this variable. The performance of two types of models, namely ordinal classification and continuous regression, is examined to determine the effect of model complexity. The experiment confirms previous findings that regression models trained on the synthetic ordinal labels reach optimal performance on the continuous outcome with less complexity than compared with performance on the ordinal labels. Additionally, the former overfit more quickly as complexity increases. This suggests that for machine learning settings where we would like to form a continuous analog for ordinal training labels, models should be trained with less complexity than what appears optimal given the observed performance on the ordinal labels reaches optimization after the underlying ground truth continuous measure’s optimization occurs.
Thomas, McKade S., "Examining Model Complexity's Effects When Predicting Continuous Measures From Ordinal Labels" (2023). All Graduate Theses and Dissertations. 8784.
Copyright for this work is retained by the student. If you have any questions regarding the inclusion of this work in the Digital Commons, please email us at .