Big Data, Small Data, and the Ethics of Scale

This past summer, two Cornell University scholars and a researcher from Facebook’s Data Science unit published a paper on what they termed “emotional contagion.” They claimed to show that Facebook’s news feed algorithm, the complex set of instructions that determines what shows up where in a news feed, could influence users’ emotional states. Using a massive data set of more than 689,003 Facebook accounts, they manipulated users’ news feeds so that some people saw more positive posts and others more negative posts. Over time, they detected a slight change in what users themselves posted: Those who saw more positive posts posted more positive posts of their own, while those who saw more negative posts posted more negative ones. Emotional contagion, they concluded, could spread among people without any direct interaction and “without their awareness.” 

Some critics lambasted Facebook for its failure to notify users that they were going to be part of a giant experiment on their emotions, but others simply thought it was cool. (My Infernal Machine colleague Ned O’Gorman has already outlined the debate.) Sheryl Sandberg, Facebook’s COO, just seemed confused. What’s all the fuss about, she wondered. This latest experiment “was part of ongoing research companies do to test different products.” Facebook wasn’t experimenting with people; it was improving its product. That’s what businesses do, especially digital business with access to so much free data. They serve their customers by better understanding their needs and desires. Some might call it manipulation. Facebook calls it marketing.

But, as technology writer Nicholas Carr points out, new digital technologies and the internet have ushered in a new era of market manipulation.

Thanks to the reach of the internet, the kind of psychological and behavioral testing that Facebook does is different in both scale and kind from the market research of the past. Never before have companies been able to gather such intimate data on people’s thoughts and lives, and never before have they been able to so broadly and minutely shape the information that people see. If the Post Office had ever disclosed that it was reading everyone’s mail and choosing which letters to deliver and which not to, people would have been apoplectic, yet that is essentially what Facebook has been doing. In formulating the algorithms that run its News Feed and other media services, it molds what its billion-plus members see and then tracks their responses. It uses the resulting data to further adjust its algorithms, and the cycle of experiments begins anew. Because the algorithms are secret, people have no idea which of their buttons are being pushed — or when, or why.

Businesses of all sorts, from publishers to grocery stores, have longed tracked the habits and predilections of their customors in order better to influence what and how much they consume. And cultural critics have always debated the propriety of such practices.

Eighteenth-century German scholars debated the intellectual integrity of publishers who deigned to treat books not only as sacred vessels of Enlightenment, but also as commodities to be fashioned and peddled to a generally unenlightened public. Friedrich Nicolai, one of late eighteenth-century Prussia’s leading publishers, described the open secrets of the Enlightenment book trade:

Try to write what everyone is talking about . . . If an Empress Catherine has died, or a Countess Lichtenau fallen out of favor, describe the secret circumstances of her life, even if you know nothing of them. Even if all your accounts are false, no one will doubt their veracity, your book will pass from hand to hand, it will be printed four times in three weeks, especially if you take care to invent a multitude of scandalous anecdotes.

The tastes and whims of readers could be formed and manipulated by a publishing trade that was in the business not only of sharing knowledge but also of producing books that provoked emotional responses and prompted purchases. And it did so in such obvious and pandering ways that its manipulative tactics were publicly debated. Immanuel Kant mocked Nicolai and his fellow publishers as industrialists who traded in commodities, not knowledge. But Kant did so in public, in print.

These previous forms of market manipulation were qualitatively different from those of our digital age. Be they the practices of eighteenth-century publishing or mid-twentieth-century television production, these forms of manipulation, claims Carr, were more public and susceptible to public scrutiny, and as long as they were “visible, we could evaluate them and resist them.” But in an age in which our online and offline lives are so thoroughly intertwined, the data of our lives—what we consume, how we communicate, how we socialize, how we live—can be manipulated in ways and to ends about which we are completely unaware and we have increasingly less capacity to evaluate.

Sheryl Sandberg would have us believe that Facebook and Google are neutral tools that merely process and organize information into an accessible format. But Facebook and Google are also companies interested in making money. And their primary technologies, their algorithms, should not be extracted from the broader environment in which they were created and are constantly tweaked by particular human beings for particular ends. They are pervasive and shape who we are and who we want to become, both individually and socially. We need to understand how live alongside them.

These are precisely the types of questions and concerns that a humanities of the twenty-first century can and should address. We need forms of inquiry that take the possibilities and limits of digital technologies seriously. The digital humanities would seem like an obvious community to which to turn for a set of practices, methods, and techniques for thinking about our digital lives, both historically and conceptually. But, to date, most scholars engaged in the digital humanities have not explicitly addressed the ethical ends and motivations of their work. (Bethany Nowviskie’s work is one exemplary exception: here and here.)

This hesitance has set them up for some broad attacks. Th recent diatribes against the digital humanities have not only peddled ignorance and lazy thinking as insight, they have also, perhaps more perniciously, managed to cast scholars interested in such methods and technologies as morally suspect. In his ill-informed New Republic article, Adam Kirsch portrayed digital humanities scholars as morally truncated technicians, obsessed with method and either uninterested in or incapable of ethical reflection. The digital humanities, Kirsch would have us believe, is the latest incarnation of the Enlightenment of Adorno and Horkheimer—a type of thinking interested only in technical mastery and unconcerned about the ends to which knowledge might be put.

Most of the responses to Kirsch and his ilk, my own included, didn’t dispute these more implicit suggestions. We conceded questions of value and purpose to the bumbling critics, as though to suggest that the defenders of a vague and ahistorical form of humanistic inquiry had a monopoly on such questions. We conceded, after a fashion, the language of ethics to Kirsch’s image of a purified humanities, one that works without technologies and with insight alone. We responded with arguments about method (“You don’t know what digital humanities scholars actually do.”) or history (“The humanities have always been interested in patterns.”).

In a keynote address last week, however, Scott Weingart encouraged humanities scholars engaged in computational analysis and other digital projects to think more clearly about the ethical nature of the work they are already doing. Echoing some of Carr’s concerns, he writes:

We are at the cusp of a new era. The mix of big data, social networks, media companies, content creators, government surveillance, corporate advertising, and ubiquitous computing is a perfect storm for intense influence both subtle and far-reaching. Algorithmic nudging has the power to sell products, win elections, topple governments, and oppress a people, depending on how it is wielded and by whom. We have seen this work from the bottom-up, in Occupy Wall Street, the Revolutions in the Middle East, and the ALS Ice-Bucket Challenge, and from the top-down in recent presidential campaigns, Facebook studies, and coordinated efforts to preserve net neutrality. And these have been works of non-experts: people new to this technology, scrambling in the dark to develop the methods as they are deployed. As we begin to learn more about network-based control and influence, these examples will multiply in number and audacity.

In light of these new scales of analysis and the new forms of agency they help create, Weingart encourages scholars, particularly those engaged in network and macroanalysis, to pay attention to the ways in which they mix the impersonal and individual, the individual and the universal. “By zooming in and out, from the distant to the close,” he writes, digital humanities scholars toggle back and forth between big and small data. Facebook, Google, and the NSA operate primarily at a macro level at which averages and aggregates are visible but not individuals. But that’s not how networks work. Networks are a messy, complex interaction of the micro and macro. They are products of the entire scale of knowledge, data, and being. Social networks and the ideas, actions, and interactions that comprise them emerge between the particular and the universal. What often distinguishes “the digital humanities from its analog counterpart,” writes Weingart, “is the distant reading, the macroanalysis.” But what binds humanities scholars of all sorts together is an “unwillingness to stray too far from the source. We intersperse the distant with the close, attempting to reintroduce the individual into the aggregate.” In this sense, scholars interested in a digital humanities are particularly well suited to challenge basic but dangerous misconceptions about the institutions and technologies that shape our world.

If we think of Facebook and Google and the computations in which we are enmeshed merely as information-processing machines, we concede our world to one end of the scale, a world of abstracted big data and all powerful algorithms. We forget that the internet, like any technology, is both a material infrastructure and, as Ian Bogost has put it, something we do. Every time we like a post on Facebook, search Google, or join the network at a local coffee shop, we participate in this massive, complex world of things and actions. We help form our technological world. So maybe its time we learn more about this world and remember that algorithms aren’t immutable, natural laws. They are, as Nowviskie puts it, rules and instructions that can manipulate and be manipulated. They are part of the our world, bound to us just as we are now to them.

