Date of Award:
12-2017
Document Type:
Dissertation
Degree Name:
Doctor of Philosophy (PhD)
Department:
Mathematics and Statistics
Committee Chair(s)
Adele Cutler
Committee
Adele Cutler
Committee
Christopher Corcoran
Committee
Richard Cutler
Committee
Jürgen Symanzik
Committee
Kyumin Lee
Abstract
The motivation of my dissertation is to improve two weaknesses of Random Forests. One, the failure to detect genetic interactions between two single nucleotide polymorphisms (SNPs) in higher dimensions when the interacting SNPs both have weak main effects and two, the difficulty of interpretation in comparison to parametric methods such as logistic regression, linear discriminant analysis, and linear regression.
We focus on detecting pairwise SNP interactions in genome case-control studies. We determine the best parameter settings to optimize the detection of SNP interactions and improve the efficiency of Random Forests and present an efficient filtering method. The filtering method is compared to leading methods and is shown that it is computationally faster with good detection power.
Random Forests allows us to identify clusters, outliers, and important features for subgroups of observations through the visualization of the proximities. We improve the interpretation of Random Forests through the proximities. The result of the new proximities are asymmetric, and the appropriate visualization requires an asymmetric model for interpretation. We propose a new visualization technique for asymmetric data and compare it to existing approaches.
Checksum
29f31819d8220477f15dee116632d889
Recommended Citation
Quach, Anna, "Extensions and Improvements to Random Forests for Classification" (2017). All Graduate Theses and Dissertations, Spring 1920 to Summer 2023. 6755.
https://digitalcommons.usu.edu/etd/6755
Included in
Copyright for this work is retained by the student. If you have any questions regarding the inclusion of this work in the Digital Commons, please email us at .