Date of Award:
Doctor of Philosophy (PhD)
Mathematics and Statistics
D. Richard Cutler
Chris D. Corcoran
Machine learning is a buzz word that has inundated popular culture in the last few years. This is a term for a computer method that can automatically learn and improve from data instead of being explicitly programmed at every step. Investigations regarding the best way to create and use these methods are prevalent in research. Machine learning models can be difficult to create because models need to be tuned. This dissertation explores the characteristics of tuning three popular machine learning models and finds a way to automatically select a set of tuning parameters. This information was used to create an R software package called EZtune that can be used to automatically tune three widely used machine learning algorithms: support vector machines, gradient boosting machines, and adaboost.
The second portion of this dissertation investigates the implementation of machine learning methods in finding locations along a genome that are associated with a trait. The performance of methods that have been commonly used for these types of studies, and some that have not been commonly used, are assessed using simulated data. The affect of the strength of the relationship between the genetic code and the trait is of particular interest. It was found that the strength of this relationship was the most important characteristic in the efficacy of each method.
Lundell, Jill F., "Tuning Hyperparameters in Supervised Learning Models and Applications of Statistical Learning in Genome-Wide Association Studies with Emphasis on Heritability" (2019). All Graduate Theses and Dissertations. 7594.
Copyright for this work is retained by the student. If you have any questions regarding the inclusion of this work in the Digital Commons, please email us at .