Date of Award

5-2011

Degree Type

Report

Degree Name

Master of Science (MS)

Department

Computer Science

Committee Chair(s)

Stephen W. Clyde

Committee

Stephen W. Clyde

Committee

Vicki Allan

Committee

Scott Cannon

Abstract

A de-duplication tool used in CHARM-II, called the CHARM Matcher, produces log files that record why it decides two records are or are not a match. This data, if properly analyzed, could help CHARM developers improve the Matcher over time by tuning its configuration. However, the log data is complex and recorded chronologically in the log files instead of in a way that would aid analysis. Further, visually studying the raw log data is a laborious and difficult task. This report describes a tool that parses and organizes the raw log data, and then produces graphical reports that summarize key performance indicators. The performance indicators give CHARM developers exactly what they need to know to improve the Matcher’s specificity and sensitivity [1] for any particular data source. A significant contribution of this report and prerequisite to creating a meaningful tool was the investigation into possible performance indicators and determination which would be best suited for the existing CHARM matcher. In anticipation of further evolution of the CHARM matcher, the proposed tool is designed to be extensible, so additional indicators and reports could be added later, as the need arises.

Comments

This work made publicly available electronically on June 13, 2012.

Share

COinS