Date of Award
Master of Science (MS)
Mathematics and Statistics
Machine learning algorithms are increasingly being used by ecologists to model and predict the distributions of individual species and entire assemblages of sites. Accurate prediction of distribution of species is an important factor in any modeling. We compared prediction accuracy of four machine learning algorithms-random forests, classification trees, support vector machines, and gradient boosting machines to a traditional method, linear discriminant models (LDM), on a large set of stream invertebrate data collected at 728 reference sites in the western United States. Classifications were constructed for individual species and for assemblages of sites clustered a priori by similarity on biological characteristics. Predictive accuracy of the classifications was evaluated by computing the percent of sites correctly classified, sensitivity, specificity, kappa, and the area under the receiver operating characteristic curve on 10-fold crossvalidated predictions from each classification method on each individual species and assemblage of sites. The predictions from each type of classification were used to estimate the Observed over Expected (O/E) index of taxa richness. Random Forests generally produced the most accurate individual species models . However, none of the machine learning algorithms showed significant improvement over LDMs for classifications of assemblages of sites and precision of the O/E index. The performance of Support Vector Machines was particularly poor for classifying individual species and assemblages of sites, and resulted in greater bias in the O/E index. We believe that the performance of models developed for species at such large spatial scales may depend more on the predictor variables available than the classification technique.
Dubal, Margi, "Comparison of Machine Learning Algorithms for Modeling Species Distributions: Application to Stream Invertebrates from Western USA Reference Sites" (2008). All Graduate Plan B and other Reports. 1298.
Copyright for this work is retained by the student. If you have any questions regarding the inclusion of this work in the Digital Commons, please email us at .