Critical Educational Questions for Big Data

Big data has arrived in education. Educational data science, learning analytics, computer adaptive testing, assessment analytics, educational data mining, adaptive learning platforms, new cognitive systems for learning and even educational applications based on artificial intelligence are fast inhabiting the educational landscape, in schools, colleges and universities, as well as in the networked spaces of online learning.

I was recently asked what I thought were some the most critical questions about big data in education today. This reminded me of the highly influential paper “Critical questions for big data” by danah boyd and Kate Crawford, in which they “ask critical questions about what all this data means, who gets access to what data, how data analysis is deployed, and to what ends.” With that in mind, here are some preliminary critical questions to ask about big data in education — a second set of questions will follow next time.

How is “big data” being conceptualized in relation to education?

Large-scale data collection has been at the centre of the statistical measurement, comparison and evaluation of the performance of education systems, policies, institutions, staff and students since the mid-1800s. Does big data constitute a novel way of enumerating education, and how do we specifically think about big data in relation to education? The sociologist David Beer has suggested we need to think about the ways in which big data as both a concept and a material phenomenon has appeared as part of a history of statistical thinking and in relation to the rise of the data analytics industry. He argues social science still needs to understand “the concept itself, where it came from, how it is used, what it is used for, how it lends authority, validates, justifies, and makes promises.”

Within education specifically, how is big data being conceptualized, thought about, and used to animate specific kinds of projects and technical developments? Where did it come from (data science, computer science?) and who are its promoters and sponsors in education? What promises are attached to the concept of big data as it is discussed within the domain of education? It’s notable that the dominant discourse of big data in education is that of “personalization,” precisely the same discourse that catalyzes the social media industry, with experts in personalization from companies like Google now becoming influential educational entrepreneurs. We might wish to think about a “big data imaginary” in education — a certain way of thinking about, envisaging and visioning the future of education through the conceptual lens of big data — that is now animating specific ed-tech projects, becoming embedded in the material reality of educational spaces, and being enacted in pedagogic practice.

Is big data changing how we think and learn?

Media theorist N. Katherine Hayles claims we have always thought “through, with, and alongside media.” Historically the ways people think have been formed by techniques of print production. Today, Hayles claims, the more we work with digital technologies the more we appreciate the capacity of networked and programmed machines to carry out sophisticated cognitive tasks. As a consequence, computers are increasingly seen as extensions of thought and cognition. Are big data technologies changing how we think and learn? With new kinds of machine learning and cognitive computing systems, we might see ourselves as being extended into vast networks of automated learning, predictive cognition and encyclopaedic knowledge-making potential. Again, as Hayles notes, digital media are pushing us in the direction of faster communication, more intense information streams, and more integration of humans with “nonconscious” cognitive systems, all of which are exerting considerable effects on how people think, perhaps even on how their brains function. The potential capacity of big data-based forms of machine learning and cognitive systems to alter neurology and cognition clearly raises significant questions for education, not least about whether existing theories of learning are adequate to explain human-machine cognitive learning processes.

What theories of learning underpin big data-driven educational technologies?

Big data-driven platforms such as learning analytics are often claimed to “optimize learning,” but it is not always clear what is meant by “learning” by the organizations and actors that build, promote and evaluate them. Much of the emerging field of “educational data science” — which encompasses educational data mining, learning analytics and adaptive learning software R&D — is informed by conceptualizations of learning that are rooted in cognitive science and cognitive neuroscience. These disciplines tend to focus on learning as an “information-processing” event — to treat learning as something that can be monitored and optimized like a computer program—and pay less attention to the social, cultural, political and economic factors that structure education and individuals’ experiences of learning. Aspects of behaviourist theories of learning also persist in behaviour management technologies that are used to collect data on students’ observed behaviours and distribute rewards to reinforce desirable conduct.

Many actors involved in educational big data analyses are also deeply informed by the disciplinary practices and assumptions of psychometrics and its techniques of psychological measurement of knowledge, skills, personality and so on. This reflects the combination of big data analytics with psychometrics that has been termed “psycho-informatics.” Are we seeing the birth of psycho-informatics as a new field of methodological invention and theory development in education via learning analytics, educational data mining and adaptive learning platform providers? Are the strongly psychological, neuroscientific and computational methods and concepts that dominate big data development in education adequate to the task of theorizing the messy, embodied, socially situated and socioculturally embedded complexity of learning?

How are machine learning systems used in education being “trained” and “taught”?

The machine learning algorithms that underpin much educational data mining, learning analytics and adaptive learning platforms need to be trained, and constantly tweaked, adjusted and optimized to ensure accuracy of results — such as predictions about future events. This requires “training data,” a corpus of historical data that the algorithms can be “taught” with to then use to find patterns in data “in the wild.” Who selects the training data? How do we know if it is appropriate, reliable and accurate? Is there a “coded gaze” at work in the training of such systems  —  “the embedded views that are propagated by those who have the power to code systems?” What if the historical data is in some ways biased, incomplete or inaccurate? Educational research has long asked questions about the selection of the knowledge in school curricula that is taught to students, and the ways it reflects and serves to reproduce the economic, social and cultural advantage of powerful groups. We may now need to ask about the selection of the data for inclusion in the training corpus of machine learning platforms — about the data themselves, the experts that select them, their assumptions about purposes of education and processes of learning, and the goals that animate them — as these data and the coded gaze of the systems designed to process them could be consequential for learners’ subsequent educational experience.

Moreover, we need to ask questions about the nature of the “learning” being experienced by machine learning algorithms. Enthusiastic advocates in places like IBM are beginning to propose that advanced machine learning is becoming more “natural,” with “human qualities” of learning, based on computational models of aspects of human brain functioning and cognition. To what extent do such claims appear to conflate understandings of the biological neural networks of the human brain that are mapped by neuroscientists with the artificial neural networks designed by computer scientists? Does this reinforce computational information-processing conceptualizations of learning, and risk addressing young human minds and the “learning brain” as computable devices that can be debugged and rewired?

These questions are of course not exhaustive, and another set will be coming next month focusing on big data ownership, divides, algorithmic accountability, issues about student voice and data literacy, and finally ethical implications and challenges of big data in education.

Banner image credit: Dave Herholz