An Integrated Approach to Exploit Linkage Disequilibrium for Ultra High Dimensional Genome-wide Data
Date of Award
5-2015
Degree Type
Thesis
Degree Name
Master of Science (MS)
Department
Mathematics and Statistics
Committee Chair(s)
Giufang Fu
Committee
Giufang Fu
Committee
David Brown
Committee
Daniel Coster
Abstract
This paper presents improved methods for analysis of genome-wide association studies in contemporary genetic research. Thanks to current sequencing methods, half to one million single-nucleotide polymorphisms (SNPs) can be feasibly generated within any given population, and there are often correlations among SNPs that cause truly causative loci to be confounded by correlated neighboring loci. Additionally, complex traits are often jointly affected by multiple genetic variants with each having small or moderate individual effects. To address these issues in genome-wide association studies, we propose a novel statistical approach, DCRR, to detect significant associations between large numbers of SNPs and phenotypes. We applied DCRR on simulations of that varied in marker allele frequencies, linkage disequilibrium, and the numbers of SNPs considered; and we analyzed a previously published Arabidopsis thaliana dataset of an AvrRpm1 binary trait. Our distance correlation was effective in ranking SNPs while the logistic ridge regression detected causative SNPs without including spurious correlated neighbors. Our results indicate that DCRR is an effective and reliable method that can improve the accuracy and efficiency of large association datasets.
Recommended Citation
Carlsen, Michelle, "An Integrated Approach to Exploit Linkage Disequilibrium for Ultra High Dimensional Genome-wide Data" (2015). All Graduate Plan B and other Reports, Spring 1920 to Spring 2023. 529.
https://digitalcommons.usu.edu/gradreports/529
Copyright for this work is retained by the student. If you have any questions regarding the inclusion of this work in the Digital Commons, please email us at .