Date of Award:


Document Type:


Degree Name:

Master of Science (MS)


Civil and Environmental Engineering


Dr. Mac McKee


The problem of quality assurance/quality control (QA/QC) for real-time measurements of environmental and water quality variables has been a field explored by many in recent years. The use of in situ sensors has become a common practice for acquiring real-time measurements that provide the basis for important natural resources management decisions. However, these sensors are susceptible to failure due to such things as human factors, lack of necessary maintenance, flaws on the transmission line or any part of the sensor, and unexpected changes in the sensors' surrounding conditions. Two types of machine learning techniques were used in this study to assess the detection of anomalous data points on turbidity readings from the Paradise site on the Little Bear River, in northern Utah: Artificial Neural Networks (ANNs) and Relevance Vector Machines (RVMs). ANN and RVM techniques were used to develop regression models capable of predicting upcoming Paradise site turbidity measurements and estimating confidence intervals associated with those predictions, to be later used to determine if a real measurement is an anomaly. Three cases were identified as important to evaluate as possible inputs for the regression models created: (1) only the reported values from the sensor from previous time steps, (2) reported values from the sensor from previous time steps and values of other water types of sensors from the same site as the target sensor, and (3) adding as inputs the previous readings from sensors from upstream sites. The decision of which of the models performed the best was made based on each model's ability to detect anomalous data points that were identified in a QA/QC analysis that was manually performed by a human technician. False positive and false negative rates for a range of confidence intervals were used as the measure of performance of the models. The RVM models were able to detect more anomalous points within narrower confidence intervals than the ANN models. At the same time, it was shown that incorporating as inputs measurements from other sensors at the same site as well as measurements from upstream sites can improve the performance of the models.