Date of Award

8-2017

Degree Type

Report

Degree Name

Master of Science (MS)

Department

Mathematics and Statistics

Committee Chair(s)

Adele Cutler

Committee

Adele Cutler

Committee

Jurgen Symanzik

Committee

Richard Cutler

Abstract

This project introduces two new methods for imputation of missing data in random forests. The new methods are compared against other frequently used imputation methods, including those used in the randomForest package in R. To test the effectiveness of these methods, missing data are imputed into datasets that contain two missing data mechanisms including missing at random and missing completely at random. After imputation, random forests are run on the data and accuracies for the predictions are obtained. Speed is an important aspect in computing; the speeds for all the tested methods are also compared.

One of the new methods is more accurate than the other, about as accurate as the most accurate method for data that are missing at random, but the speed for both methods is much slower with bigger datasets. It was not as accurate as other methods when data are missing completely at random

Share

COinS