Date of Award
Master of Science (MS)
Mathematics and Statistics
Random Forests are very memory intensive machine learning algorithms and most computers would fail at building models from datasets with millions of observations. Using the Center for High Performance Computing (CHPC) at the University of Utah and an airline on-time arrival dataset with 7 million observations from the U.S. Department of Transportation Bureau of Transportation Statistics we built 316 models by adjusting the depth of the trees and randomness of each forest and compared the accuracy and time each took. Using this dataset we discovered that substantial restrictions to the size of trees, observations allowed for each tree, and variables allowed for each split have little effect on accuracy but improve computation time by an order of magnitude.
Becoming familiar with the CHPC is significantly easier with the included tutorial at the end of the paper.
Barton, Stephen, "Tutorial for Using the Center for High Performance Computing at The University of Utah and an example using Random Forest" (2016). All Graduate Plan B and other Reports. 873.