Date of Award

8-2018

Degree Type

Report

Degree Name

Master of Science (MS)

Department

Mathematics and Statistics

Committee

Adele Cutler

Committee

Richard Cutler

Committee

John Stevens

Abstract

The Random Forest method is a useful machine learning tool developed by Leo Breiman. There are many existing implementations across different programming languages; the most popular of which exist in R, SAS, and Python. In this paper, we conduct a comprehensive comparison of these implementations with regards to the accuracy, variable importance measurements, and timing. This comparison was done on a variety of real and simulated data with different classification difficulty levels, number of predictors, and sample sizes. The comparison shows unexpectedly different results between the three implementations.

Share

COinS