Date of Award:

5-2009

Document Type:

Thesis

Degree Name:

Master of Science (MS)

Department:

Mathematics and Statistics

Department name when degree awarded

Statistics

Committee Chair(s)

D. Richard Cutler

Committee

D. Richard Cutler

Committee

Adele Cutler

Committee

Brynja Kohler

Abstract

Random Forests (RF) (Breiman 2001; Breiman and Cutler 2004) is a completely nonparametric statistical learning procedure that may be used for regression analysis and. A feature of RF that is drawing a lot of attention is the novel algorithm that is used to evaluate the relative importance of the predictor/explanatory variables. Other machine learning algorithms for regression and classification, such as support vector machines and artificial neural networks (Hastie et al. 2009), exhibit high predictive accuracy but provide little insight into predictive power of individual variables. In contrast, the permutation algorithm of RF has already established a track record for identification of important predictors (Huang et al. 2005; Cutler et al. 2007; Archer and Kimes 2008). Recently, however, some authors (Nicodemus and Shugart 2007; Strobl et al. 2007, 2008) have shown that the presence of categorical variables with many categories (Strobl et al. 2007) or high colinearity give unduly large variable importance using the standard RF permutation algorithm (Strobl et al. 2008). This work creates simulations from multiple linear regression models with small numbers of variables to understand the issues raised by Strobl et al. (2008) regarding shortcomings of the original RF variable importance algorithm and the alternatives implemented in conditional forests (Strobl et al. 2008). In addition this paper will look at the dependence of RF variable importance values on user-defined parameters.

Checksum

9d2e1847aa15ec1023ea4d3a034c574c

Recommended Citation

Merrill, Andrew C., "Investigations of Variable Importance Measures Within Random Forests" (2009). All Graduate Theses and Dissertations, Spring 1920 to Summer 2023. 7078.
https://digitalcommons.usu.edu/etd/7078

Download

Included in

Statistics and Probability Commons

COinS

Copyright for this work is retained by the student. If you have any questions regarding the inclusion of this work in the Digital Commons, please email us at .

DOI

https://doi.org/10.26076/5b9a-f34a

All Graduate Theses and Dissertations, Spring 1920 to Summer 2023

Investigations of Variable Importance Measures Within Random Forests

Date of Award:

Document Type:

Degree Name:

Department:

Department name when degree awarded

Committee Chair(s)

Committee

Committee

Committee

Abstract

Checksum

Recommended Citation

Included in

DOI

Browse

For Authors

Scholarly Communication

Research Data

All Graduate Theses and Dissertations, Spring 1920 to Summer 2023

Investigations of Variable Importance Measures Within Random Forests

Author

Date of Award:

Document Type:

Degree Name:

Department:

Department name when degree awarded

Committee Chair(s)

Committee

Committee

Committee

Abstract

Checksum

Recommended Citation

Included in

Share

DOI

Browse

For Authors

Scholarly Communication

Research Data