John Hopkins University Press
In response to the growing prominence of quantification in the humanities, scholars of media and digital culture have highlighted the friction between the cultural and disciplinary roles of data and the epistemologies of humanistic inquiry. Johanna Drucker aptly characterizes the humanities as fields that emphasize “the situated, partial, and constitutive character of knowledge production,” while data are often taken to be representations of “observer-independent reality.” Lisa Gitelman and Virginia Jackson likewise critique the dominant assumption of data’s transparency: data, they insist, “are always already ‘cooked’ and never entirely ‘raw.’” The choices involved in data collection and preparation are not objective; they are shaped by the always subjective, often tacit, and sometimes shared presuppositions of the domain-specialist researcher. Practitioners of computational approaches to literature have shown that analyzing large corpora of texts “at a distance” may reveal phenomena not readily accessible through close reading of individual texts. Yet, the notion of distance fosters an illusion of objectivity that often occludes the preconditions of such work: the transformation of cultural artifacts into objects in a series that can be embedded into computational spaces. Printed codices must become .txt files; properties of artifacts must be organized into a .csv file. That is, texts, archival materials, and historical individuals must become data, in a process that involves choices about collection, curation, and preparation. The effects of this process seldom have been theorized as part of these large-scale analyses.
To bring a more nuanced understanding of data’s mediated and constructed nature to the work of large-scale digital analysis requires a historicized and theorized account of the resources that enable it. New digital collections and databases have undoubtedly presented researchers with powerful ways to explore cultural artifacts, but their interfaces frequently efface the criteria for inclusion and exclusion in their underlying collections, bolstering the illusion that they are authoritative and comprehensive. As a way of accounting for the underlying instability of the digital archive, Bonnie Mak advocates an “archaeological approach” to collections, such as Early English Books Online, which bears the traces of earlier catalogs and microfilm resources.
In this essay we model one such approach, theorizing a new dataset of our own creation as a description that mediates and transforms our relationship to the objects it describes. While quantitative humanities scholarship is currently preoccupied with how to make meaning from large-scale analyses, we wish to shift attention to the meaning-making problems on the other side of the numbers. Rather than the massive datasets, sophisticated computational models, or rich visualizations that characterize many digital humanities approaches, we offer an account of the preconditions that enable such approaches, and we do so with regard to a single feature—that of genre. In tracing the transformation of archival artifacts into data objects, we argue that a more reflective approach to quantitative analysis opens up new interpretive terrain—terrain that takes advantage of the opportunities available at scale while maintaining the humanities’ commitment to ambiguity, mediation, and situatedness. Such an approach is necessary if the digital humanities are to remain humanistic and avoid the worst excesses of data determinism.
Vareschi, Mark and Burket, Mattie, "Archives, Numbers, Meaning: The Eighteenth-Century Playbill at Scale" (2016). English Faculty Publications. Paper 795.