Why Big Data Requires the Social Sciences

This is the first in a series of posts that will examine the big data phenomenon from the point of view of the social sciences (I am historian by training, but I will also try to bring in perspectives from sociology, anthropology, and philosophy). Since it’s not necessarily obvious what these fields might have to say about big data, I thought it would be worth beginning with a post addressing the question of why it might be useful to think about big data from social science perspectives. Indeed, it’s not only useful, but necessary to think about big data from these perspectives if we are going to get the most out of it.

Before I launch into my main points, though, I want to be clear what I am talking about here. Big data are also used to contribute to the social sciences– it is possible to do big data social science. For example, plenty of people use big data social analytics to do sociological work that might tell them about friendship, family, communication practices, etc. This is not what I am concerned with here. Rather, I’m interested in trying to use tools from the social sciences to understand the assumptions and structures of big data and what they might mean for how we do business differently or think about the world differently.

1. Technology & society

The social sciences have spent a lot of time thinking about the relationship between technology and society. Big data are, of course, part of technological system. They emerge from computers, databases, and the World Wide Web. But, big data are not only a technological phenomenon. Data are collected for particular purposes (eg. because they are valuable) by and for particular individuals or groups. In other words, which data is collected, how it is collected, from where and who, and how much, depends not just on technological capability but on the social and political interests at stake (a point that has recently been recognized by the White House in its report and in the conference on ‘The social, cultural, and ethical dimensions of Big Data‘).

This is typical of other kinds of technological systems. The form taken by televisions, bridges, power tools, or missiles doesn’t just depend on technical or engineering questions, but on social and political interests. If we want to understand why these things are the way were are, we have to understand them not merely as technological objects, but also as ‘social’ objects. All but the most superficial analysis of data is going to need to take account of how, why, and for what reason a particular set of data came to be.

My first point, then is that social sciences are uniquely equipped to begin to unpack these ‘social’ aspects of big data and help to understand their significance.

2. Structures and blindspots

The social sciences are also in the business of looking behind the curtain to see what is there. This applies not just to data themselves, but also to the technologies that produce and analyze data (algorithms, databases, etc.). When we begin to examine these with an eye towards social and political interests, we, again, see systems that are designed for very specific purposes. Social scientists are good at discovering the assumptions, agendas, and ideas that get built into such systems, often unintentionally. National education systems, for instance, are designed and built to produce productive workers and good citizens. But they also often erect structures that reinforce social privilege and serve to reproduce an elite class.

What assumptions are built into big data infrastructures? What sorts of outcomes are they likely to produce and what sorts of outcomes are likely to be obscured or absent? What sort of questions are easy to ask with big data and what sorts are hard? Social scientists are trained to look for the hidden assumptions that get built into technological systems and to point out the blindspots that they may create. One of the people who has started to think about this seriously is danah boyd (see her article, written with Kate Crawford, ‘Six Provocations for Big Data‘).

3. Authority and objectivity

Finally, social scientists are trained to skeptical of claims to objectivity and certainty in knowledge. Big data is often assumed to be, by virtue of its bigness, more authoritative. It is sometimes understood to be ‘letting the data speak for itself’. Social scientists have a toolkit to unpack such claims. Natural scientists sometimes claim to be ‘letting nature speak for itself’ – sociologists and historians of science have shown how this is, more than anything else, just a rhetorical claim: nature cannot speak and it is, in fact, the scientists doing the talking.

Data too, does not speak for itself. As some social scientists have already pointed out, data is never raw – it always comes cooked in some form. Analyzing and understanding big data means analyzing and understanding how it has been cooked and for what purpose and by who. Social scientists are well equipped to subject it to this kind of scrutiny.

All this is not to say that we should never trust big data or that big data is inherently flawed. All ways of knowing are fallible. However, if we are going to get the most out of big data, we need to understand what its strengths and weaknesses are, how it can best be used, and where it may mislead us. Attention to the ‘social’ aspects of big data is an important way to begin this kind of analysis.