Date of Award
8-2017
Degree Type
Report
Degree Name
Master of Science (MS)
Department
Mathematics and Statistics
Committee Chair(s)
Adele Cutler
Committee
Adele Cutler
Committee
Jurgen Symanzik
Committee
Richard Cutler
Abstract
This project introduces two new methods for imputation of missing data in random forests. The new methods are compared against other frequently used imputation methods, including those used in the randomForest package in R. To test the effectiveness of these methods, missing data are imputed into datasets that contain two missing data mechanisms including missing at random and missing completely at random. After imputation, random forests are run on the data and accuracies for the predictions are obtained. Speed is an important aspect in computing; the speeds for all the tested methods are also compared.
One of the new methods is more accurate than the other, about as accurate as the most accurate method for data that are missing at random, but the speed for both methods is much slower with bigger datasets. It was not as accurate as other methods when data are missing completely at random
Recommended Citation
Young, Joshua, "Imputation for Random Forests" (2017). All Graduate Plan B and other Reports, Spring 1920 to Spring 2023. 994.
https://digitalcommons.usu.edu/gradreports/994
Included in
Copyright for this work is retained by the student. If you have any questions regarding the inclusion of this work in the Digital Commons, please email us at .