Date of Award
5-2011
Degree Type
Report
Degree Name
Master of Science (MS)
Department
Computer Science
Committee Chair(s)
Stephen W. Clyde
Committee
Stephen W. Clyde
Committee
Vicki Allan
Committee
Scott Cannon
Abstract
A de-duplication tool used in CHARM-II, called the CHARM Matcher, produces log files that record why it decides two records are or are not a match. This data, if properly analyzed, could help CHARM developers improve the Matcher over time by tuning its configuration. However, the log data is complex and recorded chronologically in the log files instead of in a way that would aid analysis. Further, visually studying the raw log data is a laborious and difficult task. This report describes a tool that parses and organizes the raw log data, and then produces graphical reports that summarize key performance indicators. The performance indicators give CHARM developers exactly what they need to know to improve the Matcher’s specificity and sensitivity [1] for any particular data source. A significant contribution of this report and prerequisite to creating a meaningful tool was the investigation into possible performance indicators and determination which would be best suited for the existing CHARM matcher. In anticipation of further evolution of the CHARM matcher, the proposed tool is designed to be extensible, so additional indicators and reports could be added later, as the need arises.
Recommended Citation
Erickson, Daniel, "Log-Data Visualization Tool for Analyzing and Improving Performance of Data De-Duplication Tool in Charm-II" (2011). All Graduate Plan B and other Reports, Spring 1920 to Spring 2023. 164.
https://digitalcommons.usu.edu/gradreports/164
Included in
Copyright for this work is retained by the student. If you have any questions regarding the inclusion of this work in the Digital Commons, please email us at .
Comments
This work made publicly available electronically on June 13, 2012.