Date of Award

8-2018

Degree Type

Report

Degree Name

Master of Science (MS)

Department

Mathematics and Statistics

Committee Chair(s)

Adele Cutler

Committee

Adele Cutler

Committee

Richard Cutler

Committee

John Stevens

Abstract

The Random Forest method is a useful machine learning tool developed by Leo Breiman. There are many existing implementations across different programming languages; the most popular of which exist in R, SAS, and Python. In this paper, we conduct a comprehensive comparison of these implementations with regards to the accuracy, variable importance measurements, and timing. This comparison was done on a variety of real and simulated data with different classification difficulty levels, number of predictors, and sample sizes. The comparison shows unexpectedly different results between the three implementations.

Recommended Citation

Soifua, Breckell, "A Comparison of R, SAS, and Python Implementations of Random Forests" (2018). All Graduate Plan B and other Reports, Spring 1920 to Spring 2023. 1268.
https://digitalcommons.usu.edu/gradreports/1268

Download

Included in

Statistical Methodology Commons

COinS

Copyright for this work is retained by the student. If you have any questions regarding the inclusion of this work in the Digital Commons, please email us at .

DOI

https://doi.org/10.26076/7f2c-fd4b

All Graduate Plan B and other Reports, Spring 1920 to Spring 2023

A Comparison of R, SAS, and Python Implementations of Random Forests

Date of Award

Degree Type

Degree Name

Department

Committee Chair(s)

Committee

Committee

Committee

Abstract

Recommended Citation

Included in

DOI

Browse

For Authors

Scholarly Communication

Research Data

All Graduate Plan B and other Reports, Spring 1920 to Spring 2023

A Comparison of R, SAS, and Python Implementations of Random Forests

Author

Date of Award

Degree Type

Degree Name

Department

Committee Chair(s)

Committee

Committee

Committee

Abstract

Recommended Citation

Included in

Share

DOI

Browse

For Authors

Scholarly Communication

Research Data