Date of Award

2009

Degree Type

Report

Degree Name

Master of Science (MS)

Department

Mathematics and Statistics

First Advisor

Adele Cutler

Abstract

Random forests are ensembles of trees that give accurate predictions for regression, classification and clustering problems. The CART tree, the base learn er employed by random forests, has been criticized because of bias in the selection of splitting variables. The performance of random forests is suspect due to this criticism. A new implementation of random forests, Cforest, which is claimed to outperform random forests in both predictive power and variable importance measures , was developed based on Ctree, an implementation of conditional inference trees.

We address the underlying mechanism of random forests and Cforest in this report. Comparison of random forests and Cforest is presented based on simulated data. Our study shows that except for some extreme situations, with proper choice of tuning parameter values, random forests provides higher prediction accuracies and more reliable variable importance measures than Cforest.

Share

COinS