Date of Award

8-2018

Degree Type

Report

Degree Name

Master of Science (MS)

Department

Mathematics and Statistics

Committee Chair(s)

Adele Cutler

Committee

Adele Cutler

Committee

Richard Cutler

Committee

John Stevens

Abstract

The Random Forest method is a useful machine learning tool developed by Leo Breiman. There are many existing implementations across different programming languages; the most popular of which exist in R, SAS, and Python. In this paper, we conduct a comprehensive comparison of these implementations with regards to the accuracy, variable importance measurements, and timing. This comparison was done on a variety of real and simulated data with different classification difficulty levels, number of predictors, and sample sizes. The comparison shows unexpectedly different results between the three implementations.

Share

COinS