## All Graduate Theses and Dissertations

#### Title

A New Perspective on Classification

5-2000

Dissertation

#### Degree Name:

Doctor of Philosophy (PhD)

#### Department:

Mathematics and Statistics

Adele Cutler

Adele Cutler

Richard Culter

Renate Schaaf

#### Abstract

The idea of voting multiple decision rules was introduced in to statistics by Breiman. He used bootstrap samples to build different decision rules, and then aggregated them by majority voting (bagging). In regression, bagging gives improved predictors by reducing the variance (random variation), while keeping the bias (systematic error) the same. Breiman introduced the idea of bias and variance for classification to explain how bagging works. However, Friedman showed that for the two-class situation, bias and variance influence the classification error in a very different way than they do in the regression case.

In the first part of the dissertation, we build a theoretical framework for ensemble classifiers. Ensemble classifiers are currently the best off-the-shelf classifiers available, and they are the subject of much current research in classification. Our main theoretical results arc two theorems about voting iid (independently identically distributed) decision rules. The bias consistency theorem guarantees that voting will not change the Bias set, and the convergence theorem gives an explicit rate of convergence. The two theorems explain exactly how ensemble classifiers work. We also introduce the concept of weak consistency as opposed to the usual strong consistency. A boosting theorem is derived for a distribution-specific situation with iid voting.

In the second part of this dissertation, we discuss a special ensemble classifier called PERT. PERT is a voted random tree classifier for which each random tree classifies every training example correctly. PERT is shown to work surprisingly well. We discuss its consistency properties. We then compare its behavior to the NN (nearest neighbor) method and boosted c4.5. Both of the latter methods also classify every training example correctly. We call these types of classifiers “oversensitive” methods. We show that one reason PERT works is because of its “squeezing effect.”

In the third part of this dissertation, we design simulation studies to investigate why boosting methods work. The outlier effect of PERT is discussed and compared to boosted and bagged tree methods. We obtain a new criterion (Bayes deviance) that measures the efficiency of a classification method. We design simulation studies to compare the efficiency of several common classification methods, including NN, PERT, and boosted tree method.

#### Checksum

792e81c832f6bc18b088ab42c3fb4d50

COinS

#### DOI

https://doi.org/10.26076/65a0-85a1