Managing a community shared vocabulary for hydrologic observations
The ability to discover and integrate data from multiple sources, projects, and research efforts is critical as scientists continue to investigate complex hydrologic processes at expanding spatial and temporal scales. Until recently, syntactic and semantic heterogeneity in data from different sources made data discovery and integration difficult. The Consortium of Universities for the Advancement of Hydrologic Science, Inc. (CUAHSI) Hydrologic Information System (HIS) was developed to improve access to hydrologic data. A major semantic challenge related to data sharing and publication arose in development of the HIS. No accepted vocabulary existed within the hydrology research community for describing hydrologic observations, making it difficult to discover and synthesize data from multiple research groups even if access to the data was not a barrier. Additionally, the hydrology research community relies heavily on data collected or assembled by government agencies such as USGS and USEPA, each of which has its own semantics for describing observations. This semantic heterogeneity across data sources was a challenge in developing tools that support data discovery and access across multiple hydrologic data sources by time, geographic region, measured variable, data collection method, etc. This paper describes a community shared vocabulary and its supporting management tools that can be used by data publishers to populate metadata describing hydrologic observations to ensure that data from multiple sources published within the CUAHSI HIS are semantically consistent. We also describe how the CUAHSI HIS mediates across terms in the community shared vocabulary and terms used by government agencies to support discovery and integration of datasets published by both academic researchers and government agencies.