Date of Award


Degree Type


Degree Name

Master of Science (MS)


Mathematics and Statistics

Committee Chair(s)

Richard Cutler


Richard Cutler


Statistical classification methods are among the most widely used statistical procedures in ecology. Applications include vegetation mapping by remote sensing (Steele 2000), discrimination of subspecies using morphological measurements (Fisher 1936, 1938; Conner and Schenk 2003), and species distribution modelling (Guisan and Zimmerman 2000). Examples of the last application abound in the ecological literature and include predicting the distribution or characteristics of plant species (see, e.g., Austin et al. 1990), predicting presence and absence aquatic biota in streams (Hawkins et al. 2000), and habit at relationships of terrestrial animal species (Welch and MacMahon 2005). Over the last 20 years two mainstays among species distribution met hod s have been logistic regression (Hosmer and Lemeshow 2001) and classification trees (Breiman et al. 1984; De'ath and Fabricius 2000). Recently a numb er of highly computational classifiers have emerged from the machine learning literature in which they are generally known as supervised learning methods (Gent leman et al. 2005:273). Several of these methods have been shown to gene rally have higher classification accuracies than traditional methods. In some examples t he error rates for the best machine learn ing classifiers can be small fractions of t he error rates for older met hods (see, e.g ., Cutler et al. 2007). Although machine learning methods are typically "black box-y" in the sense that they do not yield simple formulae relating predictive classifications to predictor variables, they appear to be gaining popularity in ecological applications, presumably because high classification accuracy outweighs all other considerations.