Civil and Environmental Engineering Faculty Publications

Effect of missing data on performance of learning algorithms for hydrologic predictions: Implications to an imputation technique

M. K. Gill
T. Asefa
Mac McKee, Utah State UniversityFollow

Document Type

Article

Journal/Book Title/Conference

Water Resources Research

Volume

Publication Date

7-13-2007

Abstract

A common practice in preprocessing of data for use in hydrological modeling is to ignore observations with any missing variable values at any given time step, even if it is only one of the independent variables that is missing. In most cases, these rows of data are labeled incomplete and would not be used in either model building or subsequent model testing and verification. We argue that this is not necessarily an optimal approach for dealing with missing data because significant information could be lost when incomplete rows of data are discarded. Learning algorithms are affected by such problems more than physically based models because they rely heavily on data to learn the underlying input/output relationships of the systems being modeled. In this study, the extent of damage to the performance of learning algorithms due to missing data is explored in a field-scale application. To do so, we employed two well-known learning algorithms, namely artificial neural networks (ANNs) and support vector machines (SVMs) for short-term prediction of groundwater levels at a well field. Performance comparison is made by subjecting these algorithms to various levels of missing data. In addition to understanding the relative strengths of these algorithms in dealing with missing data, an approach for filling the data gaps in the form of an imputation methodology is proposed and tested against observed data. The utility of the current approach is further demonstrated by analyzing model runs obtained with and without imputed data. It is shown that as the percentage of missing data increases, the forecasting accuracy of ANNs is compromised more than that of SVMs. However, ANNs also derive the greater benefit from the use of imputed data.

Recommended Citation

Gill, M. K., T. Asefa, Y. Kaheil, and M. McKee. 2007. Effect of missing data on performance of learning algorithms for hydrologic predictions: Implications to an imputation technique, Water Resources Research, 43(W07416), doi:10.1029/2006WR005298.

This document is currently not available here.

COinS

DOI

https://doi.org/10.1029/2006WR005298

Civil and Environmental Engineering Faculty Publications

Effect of missing data on performance of learning algorithms for hydrologic predictions: Implications to an imputation technique

Document Type

Journal/Book Title/Conference

Volume

Publication Date

Abstract

Recommended Citation

DOI

Browse

For Authors

Scholarly Communication

Research Data

SelectedWorks Author Gallery

Civil and Environmental Engineering Faculty Publications

Effect of missing data on performance of learning algorithms for hydrologic predictions: Implications to an imputation technique

Authors

Document Type

Journal/Book Title/Conference

Volume

Publication Date

Abstract

Recommended Citation

Share

DOI

Browse

For Authors

Scholarly Communication

Research Data

SelectedWorks Author Gallery