Date of Award

5-2015

Degree Type

Thesis

Degree Name

Master of Science (MS)

Department

Mathematics and Statistics

Committee Chair(s)

Giufang Fu

Committee

Giufang Fu

Committee

David Brown

Committee

Daniel Coster

Abstract

This paper presents improved methods for analysis of genome-wide association studies in contemporary genetic research. Thanks to current sequencing methods, half to one million single-nucleotide polymorphisms (SNPs) can be feasibly generated within any given population, and there are often correlations among SNPs that cause truly causative loci to be confounded by correlated neighboring loci. Additionally, complex traits are often jointly affected by multiple genetic variants with each having small or moderate individual effects. To address these issues in genome-wide association studies, we propose a novel statistical approach, DCRR, to detect significant associations between large numbers of SNPs and phenotypes. We applied DCRR on simulations of that varied in marker allele frequencies, linkage disequilibrium, and the numbers of SNPs considered; and we analyzed a previously published Arabidopsis thaliana dataset of an AvrRpm1 binary trait. Our distance correlation was effective in ranking SNPs while the logistic ridge regression detected causative SNPs without including spurious correlated neighbors. Our results indicate that DCRR is an effective and reliable method that can improve the accuracy and efficiency of large association datasets.

Share

COinS