In the last few years, Educational Data Science has emerged as a new field of inquiry in educational research. Where did it come from, what is its likely future impact on the production of knowledge about educational practices and learning processes, and how might it affect studies in digital media and learning?
In sociological research, it has become quite fashionable to conduct studies of particular academic fields, their historical origins and development, and their methods of knowledge production. Influential research has been conducted, for example, to trace the development of psychology, neuroscience, behavioural sciences, and the various emerging branches of data scientific practice associated with big data. As an emerging field, educational data science might also be subjected to such a study, with the aim of understanding how its development in the recent past might influence the practices, methods and knowledge of educational research in the near future. We might call this a genealogy of educational data science — an attempt to trace the complex family tree of influences, institutions and individuals that have come together in the present moment to make it a legitimate field of expert practice and understanding.
Origins of Educational Data Science
We can get some sense of the genealogy of educational data science from its main advocates. For example, Philip Piety, Daniel Hickey and MJ Bishop have defined educational data science in terms of a combination of “Academic/Institutional Analytics,” “Learning Analytics/Educational Data Mining,” “Learner Analytics/Personalization,” and “Systemic Instructional Improvement.” These four areas of research and development, they argue, have begun to coalesce around shared questions, problems and assumptions over the last decade. They particularly highlight how a community of researchers began to converge around educational data mining from about 2005, and more recently to team up with the learning analytics community to form a field that “has begun to receive combined attention from both federal policymakers and foundation funders and is often seen as the community dealing with ‘Big Data’ in education.”
These authors, then, date the emergence of educational data science from around 2004-2007 as various forms of educational and learning analytics and data mining practices and communities combined. They term it a “sociotechnical movement” with shared interests that cut across the boundaries of their original communities. By sociotechnical movement what they mean is that “the enabling conditions and key technologies emerge across sectors giving rise to multiple sets of innovations that may at times seem disconnected, but are often related and interdependent.”
They also point out that a sociotechnical movement can gain traction when society’s “expectations are such that the innovations come at a time when there is other general interest in the kinds of changes that the innovations make possible.” Thus they note how there has, in recent years, been both increasing capability to produce data and a greater public appetite for the use of data across many areas of education. They also highlight how new forms of evidence—log files, conversational records, peer assessments, online search and navigation behavior, and others—are raising big questions and disrupting traditional ways of working in educational research, “acting in a way similar to disruptive innovations that alter cultural, historical practices and activity systems.”
Educational Data Science Expertise
But what do such developments mean in terms of expertise? In a conference panel convened in 2013, Simon Buckingham Shum and colleagues identified that “while the learning analytics and educational data mining research communities are tackling the question of what data can tell us about learners, relatively little attention has been paid, to date, to the specific mindset, skillset and career trajectory of the people who wield these tools.” They termed educational data scientists a “scarce breed,” and noted in particular that they would need to be experts in both learning analytics and educational data mining, as well as in a host of related techniques. In a collaborative presentation defining the field Philip Piety, John Behrens and Roy Pea traced its disciplinary origins to computer science techniques of computational statistics, data mining, machine learning, natural language processing and human-computer interaction.
Further taking up the challenge of defining the mindset and skillset of educational data scientists, Roy Pea of Stanford University has proposed a new “specialized” field combining the sciences of digital data and learning, and the construction of a “big data infrastructure” for learning consisting of data science and computer science techniques that could be harnessed to the challenge of analysing large volumes of educational and learning data. Specifically, his report identifies “several competencies for education data science,” including:
- Computational and statistical tools and inquiry methods, including traditional statistics skills … as well as newer techniques like machine learning, network analysis, natural language processing, and agent-based modeling
- General educational, cognitive science, and sociocultural principles in the sciences of learning…
- Principles of human–computer interaction, user experience design, and design-based research
- An appreciation for the ethical and social concerns and questions around big data, for both formal educational settings and non-school learning environments
Likewise, Kristen DiCerbo and John Behrens, of the commercial educational publisher and software vendor Pearson, have avidly advocated a new datafied science of learning, arguing that as “teaching and learning becomes digital, data will be available not just from once-a-year tests, but also from the wide-ranging daily activities of individual students … in real time. … [W]e need further research that brings together learning science and data science to create the new knowledge, processes, and systems this vision requires.”
These presentations and reports clearly demand a lot of expertise from educational data scientists. Roy Pea’s report calls for much more support from governments for this sector, and details the need for new undergraduate and graduate courses to support its development. Pearson, for its part, has established a Center for Digital Data, Analytics and Adaptive Learning where it is practising educational data science in-house. Its director John Behrens is an expert in measurement and statistics, whose research focuses on how “the billions of bits of digital data generated by students’ interactions with online lessons as well as everyday digital activities can be combined and reported to personalize learning,” while other staff in the center are described as “research scientists” with expertise in data mining, computer science, algorithm design, intelligent systems, human-computer interaction, data analytics tools and methods, and interactive data visualization.
If educational data science is an emerging sociotechnical movement and a new field of expertise, it is located not just in academic research settings but in new research centers at major commercial companies too. This is not a dissimilar set-up to that of Facebook’s Data Science Team, with its collaboration with Harvard (now notorious for its emotional contagion experiment), and is symptomatic of academic and commercial convergences of interests and methodologies in data science in general.
Educational data science is not, then, simply a technical field of expertise in statistical forms of analysis, but is deeply rooted in addition in “learning science,” a field itself largely defined in terms of concepts and methods from the psychological and cognitive sciences. The expertise of an educational data scientist is a hybrid of computer science (CompSci) and psychology (psy-science, as it sometimes termed). I’ve elsewhere referred to the combination of CompSci and the psy-sciences as a ‘CompPsy’ hybrid and argued that the juxtaposition of computational methods of analysis with psychological concepts is giving rise to new theories and understandings of learning that appear to challenge the accounts offered by educational researchers that are based on empirical fieldwork, ethnographies and other situated methodologies. Instead, CompPsy practices take place in a microlaboratory inside a computer system, using data analysis techniques to detect patterns in the millions of digital traces left when users undertake a task or activity. The digital microlaboratories of educational data science are both small enough to be written in computer code, but also massively distributed to aggregate individuals’ data into big population datasets that can be analysed for patterns at huge scale.
This raises important questions about the actual subjects of the research conducted by educational data scientists, and the potential insights that can be drawn from analyzing their data. Kristen DiCerbo and John Behrens of Pearson’s Center for Digital Data, Analytics and Adaptive Learning argue that as learners interact with systems and with other people, “software records” every aspect of their activity, with the result that “these developments have the potential to inform us about patterns and trajectories for individual learners, groups of learners, and schools. They may also tell us more about the processes and progressions of development in ways that can be generalised outside of school.” The promise of the educational data science methods being pioneered and practised by Pearson is therefore not simply of better tracking of learners but the generation of new generalizable theories and models of cognitive development and learner progression. Likewise, Roy Pea has highlighted “a pre-eminent objective” in educational data science of “creating a model of the learner.”
In another Pearson paper on the methodological challenges of analyzing educational big data, John Behrens claims insights extracted from the generation of huge quantities of educational data will challenge current theoretical frameworks in education research, as “new forms of data and experience will create a theory gap between the dramatic increase in data-based results and the theory base to integrate them.” The CompPsy laboratories of educational data science focus on models and patterns derived from digital trace data, and mobilize those patterns and models in the construction of new theories of learning itself. Such practices and methods relocate the subjects of educational research from situated settings and psychological labs to the digital laboratory inside the computer, and in doing so transform those subjects from embodied individuals into numerical patterns, data models, and visualized artefacts. Companies like Pearson may well then be able to use those data as insights in the production of new e-learning software products that can be marketed to schools and colleges.
The Future of Educational Data Science
Educational data science is an emerging multidisciplinary field of technical and methodological expertise, one that has developed as a movement across academic and commercial settings through a family tree of influences and practices, and that has significant potential to transform aspects of educational research and theory in future years—especially if funding and governmental backing accumulate to support its ambitions.
But we can see educational data science being enacted already in multiple settings where educational data is being tracked, mined and analysed. The “startup schools” emerging from Silicon Valley that I detailed in a previous piece—such as AltSchool, Kahn Lab School and soon Mark Zuckerberg’s The Primary School—are all based on an educational data scientific approach, utilizing data tracking and analytics to gain insights into the learners who attend them, in order to both “personalize” their pedagogic offerings through adaptive platforms and also test and refine their own psychological and cognitive theories of learning. The future of educational data science is already being prototyped in such settings, with CompPsy labs installed directly within school structures.
Moreover, as the world’s largest educational publisher and e-learning provider, Pearson is inserting educational data science practices and methods into schools and colleges worldwide. Notably, Pearson has partnered with Knewton, a major learning analytics provider, to power its digital content:
“The Knewton Adaptive Learning Platform™ uses proprietary algorithms to deliver a personalized learning path for each student…. ‘Knewton adaptive learning platform, as powerful as it is, would just be lines of code without Pearson,’ said Jose Ferreira, founder and CEO of Knewton. ‘You’ll soon see Pearson products that diagnose each student’s proficiency at every concept, and precisely deliver the needed content in the optimal learning style for each. These products will use the combined data power of millions of students to provide uniquely personalized learning.’”
As Knewton’s platform and Pearson’s products scale out globally, each constructed to perform an automated diagnostics on every individual learner and then algorithmically personalize their learning pathways, we can see how a virtual microlaboratory of educational data analysis may be installed directly in the software with which students interact. As digitized learning environments increasingly become the norm (and Pearson and Knewton aspire to make this a reality), young people will be learning in the digital microlaboratory itself, the constant subjects of diagnostic learning analytics and adaptive learning platforms.
Educational data scientists, as a professional field of expertise, may yet be a scarce breed or community, but their analytic practices and methods are already beginning to populate the schools, colleges and the software itself through which young people experience digital media and learning. The digital microlaboratories of educational data science are new sources of knowledge production with direct access to research subjects at very large scale. Understanding something of the origins, methods and working practices of this nascent field may well prove useful for researchers of digital media and learning—not least because they may be challenging the legitimacy and authority of our claims to be able to know and theorize education and learning itself.
Banner image credit: Jorge Franganillo