Date of Award:

12-2018

Document Type:

Thesis

Degree Name:

Master of Science (MS)

Department:

Mathematics and Statistics

Committee Chair(s)

John Stevens

Committee

John Stevens

Committee

Richard Cutler

Committee

Yan Sun

Abstract

Today we know that there are many genetically driven diseases and health conditions. These problems often manifest only when a set of genes are either active or inactive. Recent technology allows us to measure the activity level of genes in cells, which we call gene expression. It is of great interest to society to be able to statistically compare the gene expression of a large number of genes between two or more groups. For example, we may want to compare the gene expression of a group of cancer patients with a group of non-cancer patients to better understand the genetic causes of that particular cancer. Understanding these genetic causes could potentially lead to improved treatment options.

Initially, gene expression was tested on a per gene level for statistical difference. In more recent years, it has been determined that grouping genes together by biological processes into gene sets and comparing groups at the gene set level probably makes more sense biologically. A number of gene set test methods have since been developed. It is critically important that we know if these gene set test methods are accurate.

In this research, we compare the accuracy of a group of popular gene set test methods across a range of biologically realistic scenarios. In order to measure accuracy, we need to know whether each gene set is differentially expressed or not. Since this is not possible in real gene expression data, we use simulated data. We develop a simulation framework that generates gene expression data that is representative of actual gene expression data and use it to test each gene set method over a range of biologically relevant scenarios. We then compare the power and false discovery rate of each method across these scenarios.

Checksum

3bc84a686682ecf48cedb064314539a4

Recommended Citation

Lambert, Richard M., "Comparing Performance of Gene Set Test Methods Using Biologically Relevant Simulated Data" (2018). All Graduate Theses and Dissertations, Spring 1920 to Summer 2023. 7377.
https://digitalcommons.usu.edu/etd/7377

Download

Included in

Applied Statistics Commons

COinS

Copyright for this work is retained by the student. If you have any questions regarding the inclusion of this work in the Digital Commons, please email us at .

DOI

https://doi.org/10.26076/ff83-7639

All Graduate Theses and Dissertations, Spring 1920 to Summer 2023

Comparing Performance of Gene Set Test Methods Using Biologically Relevant Simulated Data

Date of Award:

Document Type:

Degree Name:

Department:

Committee Chair(s)

Committee

Committee

Committee

Abstract

Checksum

Recommended Citation

Included in

DOI

Browse

For Authors

Scholarly Communication

Research Data

All Graduate Theses and Dissertations, Spring 1920 to Summer 2023

Comparing Performance of Gene Set Test Methods Using Biologically Relevant Simulated Data

Author

Date of Award:

Document Type:

Degree Name:

Department:

Committee Chair(s)

Committee

Committee

Committee

Abstract

Checksum

Recommended Citation

Included in

Share

DOI

Browse

For Authors

Scholarly Communication

Research Data