Date of Award:

9-2017

Document Type:

Dissertation

Degree Name:

Doctor of Philosophy (PhD)

Department:

Mathematics and Statistics

Advisor/Chair:

Adele Cutler

Abstract

The motivation of my dissertation is to improve two weaknesses of Random Forests. One, the failure to detect genetic interactions between two single nucleotide polymorphisms (SNPs) in higher dimensions when the interacting SNPs both have weak main effects and two, the difficulty of interpretation in comparison to parametric methods such as logistic regression, linear discriminant analysis, and linear regression.

We focus on detecting pairwise SNP interactions in genome case-control studies. We determine the best parameter settings to optimize the detection of SNP interactions and improve the efficiency of Random Forests and present an efficient filtering method. The filtering method is compared to leading methods and is shown that it is computationally faster with good detection power.

Random Forests allows us to identify clusters, outliers, and important features for subgroups of observations through the visualization of the proximities. We improve the interpretation of Random Forests through the proximities. The result of the new proximities are asymmetric, and the appropriate visualization requires an asymmetric model for interpretation. We propose a new visualization technique for asymmetric data and compare it to existing approaches.

Checksum

08565221494f56dd2aafb7e170ec53ba

Included in

Mathematics Commons

Share

COinS