All Graduate Theses and Dissertations, Spring 1920 to Summer 2023

Application of Machine Learning and Statistical Learning Methods for Prediction in a Large-Scale Vegetation Map

Carla M. Brookey, Utah State UniversityFollow

Date of Award:

12-2017

Document Type:

Thesis

Degree Name:

Master of Science (MS)

Department:

Mathematics and Statistics

Committee Chair(s)

Richard Cutler

Committee

Richard Cutler

Committee

Adele Cutler

Committee

David Olsen

Abstract

Original analyses of a large vegetation cover dataset from Roosevelt National Forest in northern Colorado were carried out by Blackard (1998) and Blackard and Dean (1998; 2000). They compared the classification accuracies of linear and quadratic discriminant analysis (LDA and QDA) with artificial neural networks (ANN) and obtained an overall classification accuracy of 70.58% for a tuned ANN compared to 58.38% for LDA and 52.76% for QDA.

Because there has been tremendous development of machine learning classification methods over the last 35 years in both computer science and statistics, as well as substantial improvements in the speed of computer hardware, I applied five modern machine learning algorithms to the data to determine whether significant improvements in the classification accuracy were possible using one or more of these methods. I found that only a tuned gradient boosting machine had a higher accuracy (71.62%) that the ANN of Blackard and Dean (1998), and the difference in accuracies was only about 1%. Of the other four methods, Random Forests (RF), Support Vector Machines (SVM), Classification Trees (CT), and adaboosted trees (ADA), a tuned SVM and RF had accuracies of 67.17% and 67.57%, respectively.

The partition of the data by Blackard and Dean (1998) was unusual in that the training and validation datasets had equal representation of the seven vegetation classes, even though 85% of the data fell into classes 1 and 2. For the second part of my analyses I randomly selected 60% of the data for the training data and 20% for each of the validation data and test data. On this partition of the data a single classification tree achieved an accuracy of 92.63% on the test data and the accuracy of RF is 83.98%. Unsurprisingly, most of the gains in accuracy were in classes 1 and 2, the largest classes which also had the highest misclassification rates under the original partition of the data. By decreasing the size of the training data but maintaining the same relative occurrences of the vegetation classes as in the full dataset I found that even for a training dataset of the same size as that of Blackard and Dean (1998) a single classification tree was more accurate (73.80%) that the ANN of Blackard and Dean (1998) (70.58%).

The final part of my thesis was to explore the possibility that combining several of the machine learning classifiers predictions could result in higher predictive accuracies. In the analyses I carried out, the answer seems to be that increased accuracies do not occur with a simple voting of five machine learning classifiers.

Checksum

d2bfc7953a161743c7c8b837f3b4b229

Recommended Citation

Brookey, Carla M., "Application of Machine Learning and Statistical Learning Methods for Prediction in a Large-Scale Vegetation Map" (2017). All Graduate Theses and Dissertations, Spring 1920 to Summer 2023. 6962.
https://digitalcommons.usu.edu/etd/6962

Download

Included in

Statistics and Probability Commons

COinS

Copyright for this work is retained by the student. If you have any questions regarding the inclusion of this work in the Digital Commons, please email us at .

DOI

https://doi.org/10.26076/58c0-1ef5

All Graduate Theses and Dissertations, Spring 1920 to Summer 2023

Application of Machine Learning and Statistical Learning Methods for Prediction in a Large-Scale Vegetation Map

Date of Award:

Document Type:

Degree Name:

Department:

Committee Chair(s)

Committee

Committee

Committee

Abstract

Checksum

Recommended Citation

Included in

DOI

Browse

For Authors

Scholarly Communication

Research Data

All Graduate Theses and Dissertations, Spring 1920 to Summer 2023

Application of Machine Learning and Statistical Learning Methods for Prediction in a Large-Scale Vegetation Map

Author

Date of Award:

Document Type:

Degree Name:

Department:

Committee Chair(s)

Committee

Committee

Committee

Abstract

Checksum

Recommended Citation

Included in

Share

DOI

Browse

For Authors

Scholarly Communication

Research Data