Contextualizing Big Data and Lessons for DML

Recently I had the opportunity to go to the Internet Research 13.0 (IR13) conference hosted by the University of Salford. One popular theme of the conference was the increasingly important role that big data is playing in Internet research. As more and more people use social media sites the amount of data they generate is growing exponentially. Consequently, as researchers seek to learn more about what people do online, they are increasingly turning to this data (when they can get access to it) to make sense of the ways that the Web is continually influencing culture and communication.

While big data has been an object of inquiry for awhile in the hard sciences, it is only recently become an object of study in the humanities and social sciences. As fields from sociology to English begin to grapple with big data, one of the issues researchers have to address is how it fits into some traditional forms of disciplinary research, that is, those not focused on counting results but rather qualitative features of Internet use and communication. While it is easy to build network maps of Twitter users’ connections with each other, for example, these maps often fall short in explaining how these users’ interact on the site or the types of communication they participate in.

This tension—between the resources of big data and the detail and contextualization that come with “small data”—was the subject of one of the IR13 panels I attended. One of the speakers, Nancy Baym of Microsoft Research New England, discussed the limits of big data and how researchers need to contextualize any research that we do on online writing spaces like Twitter. As one of the attendees summarized her question on Twitter:

It occurred to me that thinking about the interplay of large-scale data and more detailed, qualitative data can be useful for teachers interested in DML. While teachers obviously don’t deal with huge data sets, they are often challenged by the use of numerical data—such as standardized test results—to determine important issues like student placement. One of the goals of DML practitioners is general education reform that would allow for new ways of calculating student performance in light of the qualitative nature of what they know about their students. I’m not arguing that large-scale data like test scores should be abandoned, but rather, like Baym and big data, that this information should be balanced and contextualized by other information sources. Just as social media has provided Internet researchers with a wealth of new data about communication practices online, digital communication tools can provide teachers and students with new ways of interacting, sharing information and resources, and participating in communities of learning that extend beyond the classroom, perhaps providing one means of achieving this balance.