All Graduate Plan B and other Reports, Spring 1920 to Spring 2023

Comparison of Machine Learning Algorithms for Modeling Species Distributions: Application to Stream Invertebrates from Western USA Reference Sites

Margi Dubal, Utah State University

Date of Award

5-2008

Degree Type

Report

Degree Name

Master of Science (MS)

Department

Mathematics and Statistics

Committee

Not specified

Abstract

Machine learning algorithms are increasingly being used by ecologists to model and predict the distributions of individual species and entire assemblages of sites. Accurate prediction of distribution of species is an important factor in any modeling. We compared prediction accuracy of four machine learning algorithms-random forests, classification trees, support vector machines, and gradient boosting machines to a traditional method, linear discriminant models (LDM), on a large set of stream invertebrate data collected at 728 reference sites in the western United States. Classifications were constructed for individual species and for assemblages of sites clustered a priori by similarity on biological characteristics. Predictive accuracy of the classifications was evaluated by computing the percent of sites correctly classified, sensitivity, specificity, kappa, and the area under the receiver operating characteristic curve on 10-fold crossvalidated predictions from each classification method on each individual species and assemblage of sites. The predictions from each type of classification were used to estimate the Observed over Expected (O/E) index of taxa richness. Random Forests generally produced the most accurate individual species models . However, none of the machine learning algorithms showed significant improvement over LDMs for classifications of assemblages of sites and precision of the O/E index. The performance of Support Vector Machines was particularly poor for classifying individual species and assemblages of sites, and resulted in greater bias in the O/E index. We believe that the performance of models developed for species at such large spatial scales may depend more on the predictor variables available than the classification technique.

Recommended Citation

Dubal, Margi, "Comparison of Machine Learning Algorithms for Modeling Species Distributions: Application to Stream Invertebrates from Western USA Reference Sites" (2008). All Graduate Plan B and other Reports, Spring 1920 to Spring 2023. 1298.
https://digitalcommons.usu.edu/gradreports/1298

Download

Included in

Mathematics Commons

COinS

Copyright for this work is retained by the student. If you have any questions regarding the inclusion of this work in the Digital Commons, please email us at .

DOI

https://doi.org/10.26076/af24-0ea7

All Graduate Plan B and other Reports, Spring 1920 to Spring 2023

Comparison of Machine Learning Algorithms for Modeling Species Distributions: Application to Stream Invertebrates from Western USA Reference Sites

Date of Award

Degree Type

Degree Name

Department

Committee

Abstract

Recommended Citation

Included in

DOI

Browse

For Authors

Scholarly Communication

Research Data

All Graduate Plan B and other Reports, Spring 1920 to Spring 2023

Comparison of Machine Learning Algorithms for Modeling Species Distributions: Application to Stream Invertebrates from Western USA Reference Sites

Author

Date of Award

Degree Type

Degree Name

Department

Committee

Abstract

Recommended Citation

Included in

Share

DOI

Browse

For Authors

Scholarly Communication

Research Data