ARPA Journal editor Troy Conrad Therrien interviewed Jeffrey Schnapp on the confusions of search. Schnap is the faculty co-director of the Berkman Center for Internet and Society at Harvard, Professor of Romance Languages & Literature and Comparative Literature, and also on the teaching faculty in the Department of Architecture at the Graduate School of Design. Co-author of Digital_Humanities from MIT Press, faculty director of metaLAB (at) Harvard, and a former founding director of the Stanford Humanities Lab, Schnapp is both a pioneer and a contemporary leader in the field of the digital humanities.
Troy Conrad Therrien: The search engine is disappearing. It is fading into an invisible ubiquity precisely because it is no longer seen as just a machine. We have reduced the search engine simply to “search,” a utility invoked often as a verb, rarely as a noun. Nouns are reserved for objects we wish to handle and have a handle on. Rather, as a verb, search casts a form of disenchanted magic: the same magic that allows us to tolerate things like air travel. We put our lives in its hands specifically–perhaps only–because we don’t understand it. Yet, we are no longer in awe of its feats. Like a light switch, it is our interface to a vast economic, political, geographic and material apparatus. And still it provides us with an unquestioned, fundamentally modern privilege. We do not have to expand our cognitive encounter beyond a single binary operation. This frictionless encounter with search, which in turn buries the engine in its effects, allows for the naturalization of search into systems of knowledge production, dissemination, and adoption.
You have an admirable history of bucking orthodoxy–your early work on the Futurists comes to mind–so I offer the preceding commentary as something of a foil, as a representation of the orthodoxy of search critique. In this exchange, I would like to discuss the effects of search in order to consider the materiality of the search engine. I would like to do this by interrogating certain sites of confusion exacerbated by the disappearance of the search engine.
One site of confusion is the distinction between the web and the Internet. Search popularized the web in a wave of such precipitous growth that for many it has become indistinguishable from the Internet. Does the way we understand web search extend to search at the scale of the Internet? Now that we have so many Internet interfaces, from mobile phones and wearables to networked domestic and urban objects, what are the consequences of this reductive metonymy?
Jeffrey Schnapp: The naturalization of search, the growing assumption that search is a given, a kind of utility like water (because transparent) or like electricity (because reliable and abundant), a piece of infrastructure and not a cognitive filter or socioeconomic construct, carries with it illusions of universal access and informational instantaneity within the distinctive mode of exchanging information over the Internet that is the World Wide Web.
The Web is, of course, the smaller of the two containers. It is merely one way of leveraging the gargantuan network of networks that is the Internet—a global system of data superpipelines that rely not just upon http communications, but also upon ftp, smtp, and the like.
To equate the poorer with the richer data ecology paves the way for a two-fold mystification. On the one hand, for a carryover of the myth of universal access and informational instantaneity to the Internet as a whole; on the other, for a lack of recognition of just how opaque and inaccessible the Web is, not to mention the subterranean continents of scarcely visible data or flows between nodes and machines that make up the Internet as a whole. Whereas the Web is the defining public(-facing) space of our era, the Internet remains the foundation upon which the Web is built. Like their analog equivalents, both are mixed spaces, sites where public and private performances as well as forms of exchange coexist with systems of control and surveillance as well as outlaw realms, variously tolerated and policed.
Now that it has become naturalized, search serves as a kind of master metaphor for transparency, intelligibility, and universal access. Once there were at least vestiges of a public conversation about page or author-rank algorithms; that conversation has now receded into the woodwork, reduced to brands names like Google vs. Yahoo vs. Bing, to chatter about advertising and personal data tracking, to worries about privacy.
How many of us are there who remember how quirky and contrasting the results were of engines like WebCrawler, Lycos, Inktomi, or HotBot, when set loose on the same data search-and-retrieval mission?
The pins in every haystack are now assumed to be readily findable, at the immediate edges of one’s fingertips (but they are not if shaped like pegs or posts instead of pins).
TCT: As historical archives become digitized, various online stores of the Internet have become a principal record of discourse. You were involved in pioneering a “digital humanities” effort through the Digital Dante Project in the early 1980s. You’ve spoken about the dream to deliver “rigor, finesse, and archival dust in seductive prose.” Is there a difference between an archive and a data set? Does search play a role in making such a distinction?
JS: Archives and data sets share certain fundamental features, though there are navigational and ontological differences. Both are curatorial constructs, providing structured environments for the accumulation, preservation, and consultation of records. Both rely upon reductive/standardized methods of description (data fields). Both chunk information into bite-size bits (files, folders, collections) and make interpretive assumptions about the primacy of chronology or biography or an institutional taxonomy. The result is a privileging of certain points and modes of access, and these points and modes, in turn, can inform processes of discovery, the elaboration of storylines, and even the forging of meanings.
This said, the world of the analog archive is more expansive and inclusive. Its sensorium includes touch and smell. Its taxonomies and metadata fields are less abstract and reductive. There’s simply no overriding reason why the debris of the world can’t be left to accumulate in its files. Much of the human record can’t be readily converted into data or capta (but it can be dumped in any physical container).
The world of data sets is potentially far more comprehensive and exhaustive, but it’s also more fragile and volatile. Data rots at a faster rate than paper; it’s decay rate is closer to that of speech than of paper-based records. So, on the one hand, we are experiencing an explosion of digital records and documentation regarding all aspects of contemporary life (including many aspects that no analog archive would or could ever have documented); on the other, we are grappling with a new kind of archival beast: one that lives under a perpetual threat of extinction and erasure.
I’m excited by the prospects that the universe of data sets opens up: the possibility, for instance, of scrutinizing a contemporary event like the 2011 Japan earthquake down to the scale of milliseconds 1 or the opportunity to zoom out to macro scales of analysis and viewing—to be able to visualize, for example, large cultural collections as aggregates so as to expose otherwise invisible histories of collecting 2 .
Data visualizations are not a science, but rather a craft and interpretive practice—and perhaps a more heavy-handed one than search. When compared to traditional finding aids—the creations, typically, of a single archivist—search multiplies the doors through which one gains access to a record. But it does so at the expense of context (in all but the best-designed database environments). Among the items that can get lost are the “artifactual” dimensions of data sets, not to mention analog archives: the ways in which they are always filled with gaps, duplications, inconsistencies that tell unintended stories (about shifts in taxonomy, categories that overlap, institutional biases).
TCT: Scholarship used to summon images of discursive pursuits, whereas research was often associated with scientific environments and methods. Now, it seems that all academic inquiries begin with search, leading investigators on heterogeneous paths. Has search leveled out the difference between scholarship and research? Are the two terms more confused today, or has it always been a false distinction?
JS:I think the scholarship vs. research distinction expresses assumptions that are unique to the anglo-American world: namely, that the humanities are disjunct from the sciences rather than part of a continuum that extends from Geisteswissenschaften to Naturwissenschaften or Sciences humaines to Sciences naturelles.
If the distinction ever made sense, I don’t think it does in a contemporary world filled with hybrid practices.
TCT: Kate Crawford recently presented the mythology of big data as the belief that “with more data comes greater accuracy and truth.”3 The telos of this epistemology takes search as a given: the value of the accumulation of information hinges on its searchability, else it provides only greater confusion. In this schema information is confused for knowledge. How do you see the relationship between information and knowledge historically? Has it changed of late? Will it change in the future? What role does, did or will search play in this trajectory?
JS: This is a crucial question. The mere existence of data sets, no matter how big or how small, does very little in or for the world. “Searchability” is simply one way of designating the fact that information has to be massaged into shapes that render it usable, which is to say, translatable into knowledge. And this (curatorial) labor typically involves the imposition of filters, hierarchies of value, taxonomical schemes, and the like.
One can, of course, work with vast sets of messy, noisy or unstructured data. But to do so is time-consuming, inefficient, and useful only to extract certain very “thin” qualities of information. If knowledge means, as I take it to mean, a deeper form of grappling with understanding, then “information” is always only a starting point. However transformational raw information’s potential is, that potentiality can only be fulfilled by means of labor-intensive processes of activation.
TCT: In After Art, David Joselit diagnoses an “image population explosion” beginning in the 20th century and accelerating today through digital media. The result is what he calls “the epistemology of search, a polemical kind of phrase and a category to say that what matters aesthetically, but also in terms of information, is not making content but configuring it, searching for it, finding what you need and making meaning from it.” A result in architecture is the creation of form as an interpretation of the results of data mining, the work of Koolhaas and parametricism being exemplary. This approach seems to confuse data visualization with design. Is this method a new form of design or something else altogether?
JS: There is no question in my mind that data visualization has emerged as one of the defining knowledge-design practices of our era, displacing a wide array of previously predominant forms of socio-cultural communication, argument, and persuasion. The mining of data streams, from tweets to instagram photos, the translation of statistical data into interactive portraits of trends, behaviors, and institutions has become pervasive from universities to corporate boardrooms to newsrooms. As noted earlier, I understand data visualization as a rhetorical craft; its raw materials are data sets that have been curated, sifted, tinkered with, and massaged; its outputs are persuasive in intent.
I would place all these operations under the umbrella of a “design practice:” a design practice that, like all design practices, can be employed to a multitude of ends, from truth-telling to entertainment to deception.
There’s no doubt in my mind that the contemporary pervasiveness of data visualization completes a process that commenced during the early years of the era of industry. Namely, it marks the definitive triumph of what I’d call the “romance of statistics” as a defining feature of modern life—the conviction that there’s a sublimity lurking somewhere in the world of statistical data; the sense that aggregate, collective tales are the defining stories, the epic narratives of an era in which the multitude is history’s protagonist. (It matters little whether the multitudes in question are consumers, opinion holders, content creators, or protesters. The story of the one is the story of the many in modern era masses made up of individuals.)
TCT: The remix or mashup artists that the previous question invokes presents another confusion, that between creativity and originality. Does a design methodology built on search foreclose on the possibility of originality? Does it collapse it into the more general creativity?
JS: Creativity has always been a contingent, socially constructed notion, and I see no reason to exclude remix- and mashup-based from creative practice. After all, (re)search was integral to pre-digital design practices as well. It just assumed a different form: a trip to the local Volksmuseum, stock house, or library; cracking the pages of a catalog or book.
Among the differences between analog and digital search processes, however, is the absolute flexibility and fungibility of the objects “produced” by means of digital searching. A screen of Google images is generated by upsizing or downsizing every image so that it fits precisely within a rigid template that eliminates all voids. And every image that makes up the grid is itself a resizable surrogate that can be viewed outside of any given context. What one gains in flexibility one loses in terms of the object’s basic sensory attributes and physical affordances. (Those tribal fabric samples gathered in the storage case of a Musée d’Anthropologie won’t scale up or down; and they may well speak their textures only to the sense of touch.)
There’s always something to be lost and something to be gained. So artistry assumes the form of locating and tapping into the expressive power of vastly expanded universes made up of surrogates. Perhaps a different kind of originality is at issue here with respect to the Romantic one that is implied when we speak of modern forms of “originality:” rather than seeking out an origo or arché, the designer is a master of flow controls (in the form of data scraping, shaping, crafting, sculpting).
TCT: Twitter has recently given Deb Roy of the MIT Media Lab funding and access to not only its historical data, but also its fire hose of daily tweets.4 Some scholars – Tafuri and Vidler come to mind – have argued the speciousness of operating as a historian in the immediate present. As our sources of historical material become available in real time, is there a historiographical role for historians to play? As a cultural historian and leading pioneer of the digital humanities, what opportunities do you see for a real time historiographical method? Or, is this simply confusing history with theory?
JS: Though I greatly admire both Tafuri and Vidler, I am intrigued by the notion of expanding the compass of historiography to encompass the immediate present, even if such an expansion unbalances our conventional sense of the existence of a clear partition line between “serious” scholarship and “mere” journalism, historical inquiry and analysis of the present. What counts as “historical” has always been a matter of contention within the field of history and it’s worth recalling that, up until a century ago, even modern history (not to mention, contemporary history) was often dismissed as unsuitable for historical analysis.
Real-time documentation, data that emerge right at the edge of unfolding events and that surround them like a halo even as events shapeshift, can allow us to excavate and tell stories on scales and from perspectives that were hitherto unthinkable. And the stories in question can be shaped both in traditional ways and in ways that reach audiences that are unlikely ever to pick up a scholarly monograph or read a specialized journal. I call this expanded field of scholarly practice “knowledge design,” a label that I prefer to “digital humanities,” and have recently published a pamphlet on for the Volkswagen Foundation, which can be downloaded online 5
By their very nature, traditional archives sculpted “events” into the sorts of sizes and scales that the recording and documentation technologies of a given era could support. We now have the opportunity to do so on other sizes and scales. The craft of the historian, the historian’s commitment to depth, complexity, attention to context, and to standards of evidence and proof, seems to me in no way compromised by writing histories of the present any more than it was by writing microhistories of forgotten phenomena or longue durée histories.
- 1. Digital Archive of Japan’s 2011 Disasters, http://jdarchive.org/en/home, accessed November 13, 2014 ^
- 2. Curarium, http://curarium.com/, accessed November 13, 2014. ^
- 3. Crawford, Kate, “The Anxieties of Big Data,” The New Inquiry May 30, 2014, accessed November 13, 2014. link ^
- 4. Christina Farr,”Twitter grants $10 million to MIT for social data analysis, new tools” Reuters, Oct 1, 2014, accessed November 13, 2014. link>/a> ^
- 5. Jeffrey Schnapp, “Knowledge Design” at Jeffreyschnapp.com, accessed November 13, 2014. link. ^
Faculty co-director of the Berkman Center for Internet and Society, Jeffrey T. Schnapp is Professor of Romance Languages & Literature and Comparative Literature, and also on the teaching faculty in the Department of Architecture at Harvard’s Graduate School of Design. He is the faculty director of metaLAB (at) Harvard. Before moving to Harvard in 2011, he occupied the Pierotti Chair of Italian Studies at Stanford, where he founded the Stanford Humanities Lab in 1999, a lab which he led and directed until his departure.