Date of Award:

Summer 2017

Document Type:


Degree Name:

Master of Science (MS)


Mathematics and Statistics


Richard Cutler


The health of freshwater aquatic systems, particularly stream networks, is mainly influenced by water temperature, which controls biological processes and influences species distributions and aquatic biodiversity. Thermal regimes of rivers are likely to change in the future, due to climate change and other anthropogenic impacts, and our ability to predict stream temperatures will be critical in understanding distribution shifts of aquatic biota. Spatial statistical network models take into account spatial relationships but have drawbacks, including high computation times and data pre-processing requirements. Machine learning techniques and generalized additive models (GAM) are promising alternatives to the SSN model. Two machine learning methods, gradient boosting machines (GBM) and Random Forests (RF), are computationally efficient and can automatically model complex data structures. However, a study comparing the predictive accuracy among a variety of widely-used statistical modeling techniques has not yet been conducted.

My objectives for this study were to 1) compare the accuracy among linear models (LM), SSN, GAM, RF, and GBM in predicting stream temperature over two stream networks and 2) provide guidelines in choosing a prediction method for practitioners and ecologists. Stream temperature prediction accuracies were compared with the test-set root mean square error (RMSE) for all methods. For the actual data, SSN had the highest predictive accuracy overall, which was followed closely by GBM and GAM. LM had the poorest performance overall. This study shows that although SSN appears to be the most accurate method for stream temperature prediction, machine learning methods and GAM may be suitable alternatives.