Date of Award:


Document Type:


Degree Name:

Doctor of Philosophy (PhD)


Mathematics and Statistics

Committee Chair(s)

D. Richard Cutler


D. Richard Cutler


Chris D. Corcoran


Adele Cutler


Zachariah Gompert


Jürgen Symanzik


Machine learning is a buzz word that has inundated popular culture in the last few years. This is a term for a computer method that can automatically learn and improve from data instead of being explicitly programmed at every step. Investigations regarding the best way to create and use these methods are prevalent in research. Machine learning models can be difficult to create because models need to be tuned. This dissertation explores the characteristics of tuning three popular machine learning models and finds a way to automatically select a set of tuning parameters. This information was used to create an R software package called EZtune that can be used to automatically tune three widely used machine learning algorithms: support vector machines, gradient boosting machines, and adaboost.

The second portion of this dissertation investigates the implementation of machine learning methods in finding locations along a genome that are associated with a trait. The performance of methods that have been commonly used for these types of studies, and some that have not been commonly used, are assessed using simulated data. The affect of the strength of the relationship between the genetic code and the trait is of particular interest. It was found that the strength of this relationship was the most important characteristic in the efficacy of each method.