Date of Award:


Document Type:


Degree Name:

Master of Science (MS)


Computer Science

Committee Chair(s)

Robert F. Erbacher


Robert F. Erbacher


Stephen J. Allan


Chad Mano


This research presents a new and unique technique called SÁDI, statistical analysis data identification, for identifying the type of data on a digital device and its storage format based on data type, specifically the values of the bytes representing the data being examined. This research incorporates the automation required for specialized data identification tools to be useful and applicable in real-world applications. The SÁDI technique utilizes the byte values of the data stored on a digital storage device in such a way that the accuracy of the technique does not rely solely on the potentially misleading metadata information but rather on the values of the data itself. SÁDI provides the capability to identify what digitally stored data actually represents. The identification of the relevancy of data is often dependent upon the identification of the type of data being examined. Typical file type identification is based upon file extensions or magic keys. These typical techniques fail in many typical forensic analysis scenarios, such as needing to deal with embedded data, as in the case of Microsoft Word files or file fragments. These typical techniques for file identification can also be easily circumvented, and individuals with nefarious purposes often do so.

The results from the development of this technique will greatly enhance the capabilities of legal forensic units, as well as expand the knowledge base in the fields of computer forensics and digital security. The results presented here are promising and certainly do not represent the complete capability of this new technique. They compare favorably with other techniques from recent research and with the capabilities and performance of the professional tools currently in use in real-world forensics situations.