Making Education as Machine-readable as Digital Data

Data have long been used to manage education. Data appear to make the messy complexity of schools and schooling more easy to understand, and help policymakers in their decision making. Now, with the rise of “big data” and associated data processing, mining and analytics software, a new style of digital education policymaking is making education increasingly machine-readable.

In particular, education policy is now being influenced to a significant degree by the design of the devices through which educational data are collected, calculated, analysed, interpreted and visualized. As a result, schools and classrooms are being configured as machine-readable “data centres” linked to vast global data collection programmes, and the “reality” of education is being rearticulated in numerical practices enabled by software developments, data companies, and data analysis instruments. The shift toward data in the education policy process can be illustrated with a number of examples of digital education governance in action, especially the work of spreadsheets, visualizations and analytics.

The Organization for Economic Cooperation and Development (OECD) has been at the forefront in the development of data systems in education, particularly with its Programme for International Student Assessment (PISA), an international survey which aims to evaluate education systems worldwide by testing the skills and knowledge of 15-year-old students every three years. PISA generates massive datasets on educational performances that are available for download on the programme website, consisting of data files in ASCII format, codebooks, compendia and control files for statistical software packages such as SPSS in order to process the data.

In order to help users make sense of the data, the OECD has also launched Education GPS, an interactive data analysis site that specifically facilitates easy access to PISA data in order to compare countries on a “wide range of indicators.” Through a simplified menu system, it allows the user to compile PISA data in order to “create your own, customized country reports, highlighting the facts, developments and outcomes of your choice,” and to compare and review different countries’ educational policies. The user can generate extensive customized datasets, and the Education GPS tools can generate maps overlaid with data representations, charts, and scatterplots. Essentially, the OECD enables education systems to be translated into machine-readable numerical data files, housed in spreadsheets, which can then be mobilized through seemingly scientific, objective and nonpolitical modes of measurement and graphical presentation to inform educational decision making. Education GPS exemplifies how spreadsheets have come to fulfil a powerful function in education, making the “reality” of learning intelligible in millions of numerical measurements and visible in interactive data visualizations.

As a data source that is enacted through spreadsheet software, PISA has become a major policy instrument of educational governance. Through the humble spreadsheet, PISA enables educational progress over time to be enumerated, tabularized and made intelligible. From there, the data can be taken into meetings and policy conversations, and used as the digitally encoded basis for embodied human decision making.

As already noted, a key techniques of digital education governance is data visualization. A notable producer of data visualizations in education is the global educational publisher Pearson Education. Pearson’s Learning Curve Data Bank combines 60 datasets on educational performance in countries around the world in order to produce a “Global Index” ranked in terms of “educational attainment” and “cognitive skills.” Mirroring the shift toward the continuous collection of “big data,” the Learning Curve is rapidly updated as new datasets become available, and highly relational, enabling the conjoining of multiple datasets.

It also features a suite of dynamic and user-friendly mapping and time series tools that allow countries to be compared and evaluated both spatially and temporally. Countries’ educational performance in terms of educational attainment and cognitive skills are represented on the site as semantically resonant “heat maps.” It also permits the user to generate “country profiles” that visually compare multiple “education input indicators” (such as public educational expenditure, pupil:teacher ratio, educational “life expectancy”) with “education output indicators” (PISA scores, graduation rates, labour market productivity), as well as “socio-economic indicators” (such as GDP and crime statistics).

These visual methods give the numbers meaning; they translate numerical measurements into curves and trends; and they make the data amenable to being inserted into presentations and arguments that might be used to produce conviction in others. In other words, data visualization gives numbers some pliability and plasticity to be shaped and configured as powerful and persuasive presentations. The Learning Curve is a powerful technique of political visualization for envisioning the educational landscape, operationalizing the presentation and re-presentation of numbers for a variety of purposes, users and audiences. Michael Barber, the Chief Education Adviser to Pearson who launched the Learning Curve has described it as allowing the public to “connect those bits together” in a way that is more “fun” and “co-creative” than preformatted policy reports.

Even so, as an interactive and co-creative policy instrument, the Learning Curve is no neutral device. It’s important to acknowledge that any data visualization is an expertly crafted methodological and technical accomplishment, not simply a visual reproduction of some underlying reality. Any visualization produced using software and digital data is an “interfacial site” created through networks of human bodies at work with various kinds of software and hardware, facilitated by vast repositories of code and databases of fine-grained information, and possesses productive power to shape people’s engagement and interaction with the world it represents.

Within the carefully designed interactive environment of the Learning Curve, users’ own analyses are in effect preformatted by the design of the interface. Its design encourages users to conduct their own country comparisons, and to see themselves as data analysts engaging with the data to make their own informed judgments about it. At the same time, the Learning Curve is structured according to the social media logic of “prosumption,” where users are seen not simply as consumers of data but as its producers too. The Learning Curve therefore reconfigures education governance as a form of “play” and “fun” that is consonant with the logics of social media participation and audience democracy in the popular domain, but at the same time preformats the possible results of such activities through the methodological preferences built-in to its interface. It incites the wider publics of education to see themselves as comparative data analysts, and as participatory actors in the flow of comparative data, but subtly configures and delimits what users can do and what can be said about the data.

Predictive analytics
The emergence of big data in education means that data can now, increasingly, be collected and analysed in real-time and automatically. Pearson Education, for example, has established a Center for Digital Data, Analytics, and Adaptive Learning that is intended to “make sense of learning in the digital age.” It has produced a report on the impacts of big data on education that envisions education systems where “teaching and learning becomes digital” and “data will be available not just from once-a-year tests, but also from the wide-ranging daily activities of individual students.”

Pearson has also published a report calling for an “educational revolution” in the policy process. It calls for a shift in the focus from the governance of education through the institution of the school to “the student as the focus of educational policy and concerted attention to personalising learning.” In particular, the report promotes “the application of data analytics and the adoption of new metrics to generate deeper insights into and richer information on learning and teaching,” as well as “online intelligent learning systems,” and the use of data analytics and automated artificial intelligence systems to provide “ongoing feedback to personalize instruction and improve learning and teaching.” Pearson’s aim is to shift away from large-scale testing to less obtrusive methods of performance data collection and analysis.

Ultimately, the data analytics being developed at Pearson anticipate a new form of “up-close” digital educational governance. These analytics capacities complement existing large-scale database techniques of governance conducted at discrete temporal intervals through large-scale testing like PISA but also, to some extent, short-circuit those techniques. The deployment of big data practices in schools is intended to accelerate the timescales of governing by numbers, making the collection of enumerable educational data, its processes of calculation, and its consequences into a real-time and recursive process materialized and operationalized “up close” from within the classroom and regulated “at a distance” by new centers of expertise in digital data analytics, visualization and statistical calculation.

Data experts
Digitally rendered as a vast surface of machine-readable data traces, education is becoming increasingly amenable to being effortlessly and endlessly crawled, scraped and mined for policy insights, then enumerated, visualized and predicted in order to inform decision making. The new managers of this virtual world of educational data are the technical, statistical, methodological and graphical experts — both human and non-human — that translate schools and the learners within them into enumerable, visualized and anticipatory data, and  address their audiences as particular kinds of interactive users. New kinds of data careers have been made possible for the educational data scientists, experts and algorithmists required to do the data work, construct the database architectures, and design the analytics that now make education policy operational and productive.

The techniques produced and promoted by such data experts appear to move education governance to new centers of technical expertise and judgment, and to accelerate the timescales of digital data collection and use in education. They complement the massive, longitudinal datasets such as those held by national governments or by massive international organizations with more dynamic, automated, and recursive systems that are intended to sculpt learners’ performances in real-time through the “personalized” pedagogic instruments of the classroom. Looking at educational data in this way means going beyond the “policy numbers” (or the data themselves) to acknowledge the combination of technologies, human actors, institutions, methodologies, and social and political contexts that frame and shape their production and use. This requires us to consider the specific instruments, code and algorithms, human hands, eyes and minds, companies and agencies, and the wider contexts that constitute the data infrastructures of educational data production and give it its productive power.

Banner image credit: NessieNoodle