Date of Award

12-2016

Degree Type

Report

Degree Name

Master of Science (MS)

Department

Mathematics and Statistics

Committee Chair(s)

Adele Cutler

Committee

Adele Cutler

Committee

Richard Cutler

Committee

John Stevens

Abstract

Random Forests are very memory intensive machine learning algorithms and most computers would fail at building models from datasets with millions of observations. Using the Center for High Performance Computing (CHPC) at the University of Utah and an airline on-time arrival dataset with 7 million observations from the U.S. Department of Transportation Bureau of Transportation Statistics we built 316 models by adjusting the depth of the trees and randomness of each forest and compared the accuracy and time each took. Using this dataset we discovered that substantial restrictions to the size of trees, observations allowed for each tree, and variables allowed for each split have little effect on accuracy but improve computation time by an order of magnitude.

Becoming familiar with the CHPC is significantly easier with the included tutorial at the end of the paper.

Share

COinS