Date of Award:

5-2010

Document Type:

Thesis

Degree Name:

Master of Science (MS)

Department:

Plants, Soils, and Climate

Advisor/Chair:

Janis L. Boettinger

Abstract

Initial soil surveys are incomplete for large tracts of public land in the western USA. Digital soil mapping offers a quantitative approach as an alternative to traditional soil mapping. I sought to predict soil classes across an arid to semiarid watershed of western Utah by applying random forests (RF) and using environmental covariates derived from Landsat 7 Enhanced Thematic Mapper Plus (ETM+) and digital elevation models (DEM). Random forests are similar to classification and regression trees (CART). However, RF is doubly random. Many (e.g., 500) weak trees are grown (trained) independently because each tree is trained with a new randomly selected bootstrap sample, and a random subset of variables is used to split each node. To train and validate the RF trees, 561 soil descriptions were made in the field. An additional 111 points were added by case-based reasoning using aerial photo interpretation. As RF makes classification decisions from the mode of many independently grown trees, model uncertainty can be derived. The overall out of the bag (OOB) error was lower without weighting of classes; weighting increased the overall OOB error and the resulting output did not reflect soil-landscape relationships observed in the field. The final RF model had an OOB error of 55.2% and predicted soils on landforms consistent with soil-landscape relationships. The OOB error for individual classes typically decreased with increasing class size. In addition to the final classification, I determined the second and third most likely classification, model confidence, and the hypothetical extent of individual classes. Pixels that had high possibility of belonging to multiple soil classes were aggregated using a minimum confidence value based on limiting soil features, which is an effective and objective method of determining membership in soil map unit associations and complexes mapped at the 1:24,000 scale. Variables derived from both DEM and Landsat 7 ETM+ sources were important for predicting soil classes based on Gini and standard measures of variable importance and OOB errors from groves grown with exclusively DEM- or Landsat-derived data. Random forests was a powerful predictor of soil classes and produced outputs that facilitated further understanding of soil-landscape relationships.

Comments

This work made publicly available electronically on August 30, 2010.

Share

COinS