Date of Award:

12-2011

Document Type:

Thesis

Degree Name:

Master of Science (MS)

Department:

Civil and Environmental Engineering

Committee Chair(s)

Mac McKee

Committee

Mac McKee

Committee

David K. Stevens

Committee

Jeffery S. Horsburgh

Abstract

The search for improvements in the quality assurance/quality control (QA/QC) of real-time environmental measurements has been a field well exploited in recent years. These measurements describe actual environmental conditions and processes that provide relevant information upon which water quality management decisions are based. In situ sensors (located at the site of interest) are commonly used for such real-time measurement purposes. However, the performance of these types of sensors can be affected by such things as human factors, lack of necessary maintenance, flaws on the transmission line or any part of the sensor, and unexpected changes in the sensors surrounding conditions. These issues have increased the importance of the early detection of anomalous data points within a recorded time series.

This research focuses on the detection of anomalous data points on turbidity readings from the Paradise site on the Little Bear River, in northern Utah. To do so, two machine learning techniques were used: Artificial Neural Networks (ANNs) and Relevance Vector Machines (RVMs). These techniques were used to develop regression models capable of predicting (with determined confidence intervals) what the next Paradise turbidity time step value should be. The ANNs have displayed good performance for this type of prediction but the RVMs have not been tested yet on the real-time anomaly detection problem. Since for other related applications the RVMs consistently displays better results than the ANNs, there is a motivation for this research to deeply explore that technique.

This research also addressed the possibility of improving results based on evaluating a broader combination of inputs. Three cases were identified as important: (1) only the reported values from the sensor from previous time steps, (2) reported values from the sensor from previous time steps and values of other water types of sensors from the same site as the target sensor, and (3) adding as inputs the previous readings from sensors from upstream sites.

Points detected as anomalous by the models were compared to data points obtained from a QA/QC analysis performed by a human technician. This allowed obtaining the rate of success of the models which was later express on a false positive and false negative basis.

Results determined that the inclusion as input of measurements from other sensors at the same site as well as measurements from upstream sites can improve the models performance. Also, it was shown that RVM models detected more anomalous points within narrower confidence intervals than the ANN models.

Checksum

59724780358c6c102c99f713229b2d9e

Share

COinS