Date of Award:

2000

Document Type:

Dissertation

Degree Name:

Doctor of Philosophy (PhD)

Department:

Mathematics and Statistics

Advisor/Chair:

Adele Cutler

Co-Advisor/Chair:

Richard Culter

Third Advisor:

Renate Schaaf

Abstract

The idea of voting multiple decision rules was introduced in to statistics by Breiman. He used bootstrap samples to build different decision rules, and then aggregated them by majority voting (bagging). In regression, bagging gives improved predictors by reducing the variance (random variation), while keeping the bias (systematic error) the same. Breiman introduced the idea of bias and variance for classification to explain how bagging works. However, Friedman showed that for the two-class situation, bias and variance influence the classification error in a very different way than they do in the regression case.

In the first part of the dissertation, we build a theoretical framework for ensemble classifiers. Ensemble classifiers are currently the best off-the-shelf classifiers available, and they are the subject of much current research in classification. Our main theoretical results arc two theorems about voting iid (independently identically distributed) decision rules. The bias consistency theorem guarantees that voting will not change the Bias set, and the convergence theorem gives an explicit rate of convergence. The two theorems explain exactly how ensemble classifiers work. We also introduce the concept of weak consistency as opposed to the usual strong consistency. A boosting theorem is derived for a distribution-specific situation with iid voting.

In the second part of this dissertation, we discuss a special ensemble classifier called PERT. PERT is a voted random tree classifier for which each random tree classifies every training example correctly. PERT is shown to work surprisingly well. We discuss its consistency properties. We then compare its behavior to the NN (nearest neighbor) method and boosted c4.5. Both of the latter methods also classify every training example correctly. We call these types of classifiers “oversensitive” methods. We show that one reason PERT works is because of its “squeezing effect.”

In the third part of this dissertation, we design simulation studies to investigate why boosting methods work. The outlier effect of PERT is discussed and compared to boosted and bagged tree methods. We obtain a new criterion (Bayes deviance) that measures the efficiency of a classification method. We design simulation studies to compare the efficiency of several common classification methods, including NN, PERT, and boosted tree method.

Checksum

f3fe0a16d9cad6cbe2dbd11748e17380

Included in

Mathematics Commons

Share

COinS