Document Type

Article

Journal/Book Title/Conference

BMC Bioinformatics

Volume

18

Issue

212

Publisher

BioMed Central

Publication Date

4-12-2017

DOI

10.1186/s12859-017-1617-9

Abstract

Background
Although the dimension of the entire genome can be extremely large, only a parsimonious set of influential SNPs are correlated with a particular complex trait and are important to the prediction of the trait. Efficiently and accurately selecting these influential SNPs from millions of candidates is in high demand, but poses challenges. We propose a backward elimination iterative distance correlation (BE-IDC) procedure to select the smallest subset of SNPs that guarantees sufficient prediction accuracy, while also solving the unclear threshold issue for traditional feature screening approaches.

Results
Verified through six simulations, the adaptive threshold estimated by the BE-IDC performed uniformly better than fixed threshold methods that have been used in the current literature. We also applied BE-IDC to an Arabidopsis thaliana genome-wide data. Out of 216,130 SNPs, BE-IDC selected four influential SNPs, and confirmed the same FRIGIDA gene that was reported by two other traditional methods.

Conclusions
BE-IDC accommodates both the prediction accuracy and the computational speed that are highly demanded in the genomic selection.

Included in

Mathematics Commons

Share

COinS