Presenter Information

Kelvyn Bladen, Utah State University

Class

Article

College

College of Science

Department

Mathematics and Statistics Department

Faculty Mentor

D. Richard Cutler

Presentation Type

Poster Presentation

Abstract

Random Forests are a widely used predictive technique in the modern data analyst’s toolkit. As with other machine learning methods, Random Forests have hyper-parameters that should be tuned for getting the best predictive accuracy and for interpretation. Variable importance measures give users valuable insights regarding which features are most informative for prediction. The subject of my research is the commonly used permutation importance algorithm with Random Forests. Key results of my research are: 1. When predictive features are highly correlated, importance values can be misleading. 2. The best choice of the Random Forests hyper-parameter mtry for importances may be quite different from the best mtry for prediction, especially when features are highly correlated. When correlated features are byproducts of each other, then using larger values of mtry gives superior importance values. 3. The square root of importance values is a better measure than the raw values.4. A collection of importances, accuracy, and association measures is more helpful than a single tuning measure. I implemented plots and measures associated with the results above in a package for the R programming language to assist users of Random Forests. Ultimately, it helps analysts tune Random Forests based on variable importance information as well as predictive accuracy.

Location

Logan, UT

Start Date

4-12-2023 12:30 PM

End Date

4-12-2023 1:30 PM

Included in

Mathematics Commons

Share

COinS
 
Apr 12th, 12:30 PM Apr 12th, 1:30 PM

Tuning Random Forests for Interpretability

Logan, UT

Random Forests are a widely used predictive technique in the modern data analyst’s toolkit. As with other machine learning methods, Random Forests have hyper-parameters that should be tuned for getting the best predictive accuracy and for interpretation. Variable importance measures give users valuable insights regarding which features are most informative for prediction. The subject of my research is the commonly used permutation importance algorithm with Random Forests. Key results of my research are: 1. When predictive features are highly correlated, importance values can be misleading. 2. The best choice of the Random Forests hyper-parameter mtry for importances may be quite different from the best mtry for prediction, especially when features are highly correlated. When correlated features are byproducts of each other, then using larger values of mtry gives superior importance values. 3. The square root of importance values is a better measure than the raw values.4. A collection of importances, accuracy, and association measures is more helpful than a single tuning measure. I implemented plots and measures associated with the results above in a package for the R programming language to assist users of Random Forests. Ultimately, it helps analysts tune Random Forests based on variable importance information as well as predictive accuracy.