Date of Award
12-2016
Degree Type
Report
Degree Name
Master of Science (MS)
Department
Mathematics and Statistics
Committee Chair(s)
Adele Cutler
Committee
Adele Cutler
Committee
Richard Cutler
Committee
John Stevens
Abstract
Random Forests are very memory intensive machine learning algorithms and most computers would fail at building models from datasets with millions of observations. Using the Center for High Performance Computing (CHPC) at the University of Utah and an airline on-time arrival dataset with 7 million observations from the U.S. Department of Transportation Bureau of Transportation Statistics we built 316 models by adjusting the depth of the trees and randomness of each forest and compared the accuracy and time each took. Using this dataset we discovered that substantial restrictions to the size of trees, observations allowed for each tree, and variables allowed for each split have little effect on accuracy but improve computation time by an order of magnitude.
Becoming familiar with the CHPC is significantly easier with the included tutorial at the end of the paper.
Recommended Citation
Barton, Stephen, "Tutorial for Using the Center for High Performance Computing at The University of Utah and an example using Random Forest" (2016). All Graduate Plan B and other Reports, Spring 1920 to Spring 2023. 873.
https://digitalcommons.usu.edu/gradreports/873
Copyright for this work is retained by the student. If you have any questions regarding the inclusion of this work in the Digital Commons, please email us at .