Date of Award


Degree Type


Degree Name

Master of Science (MS)


Mathematics and Statistics

Committee Chair(s)

Adele Cutler


Adele Cutler


Jurgen Symanzik


Richard Cutler


This project introduces two new methods for imputation of missing data in random forests. The new methods are compared against other frequently used imputation methods, including those used in the randomForest package in R. To test the effectiveness of these methods, missing data are imputed into datasets that contain two missing data mechanisms including missing at random and missing completely at random. After imputation, random forests are run on the data and accuracies for the predictions are obtained. Speed is an important aspect in computing; the speeds for all the tested methods are also compared.

One of the new methods is more accurate than the other, about as accurate as the most accurate method for data that are missing at random, but the speed for both methods is much slower with bigger datasets. It was not as accurate as other methods when data are missing completely at random