I started a list of critical questions for big data in education earlier this week. This is a big topic, raising lots of big questions and serious topics and problems for further debate and discussion. Here, I focus on questions about big data ownership, divides, algorithmic accountability, issues about voice and literacy, and, finally, ethical implications and challenges of big data in education.
Who “owns” educational big data?
The sociologist Evelyn Ruppert has asked, “who owns big data?” noting that numerous people, technologies, practices and actions are involved in how data is shaped, made and captured. The technical systems for conducting educational big data collection, analysis and knowledge production are expensive to build. Specialist technical staff are required to program and maintain them, to design their algorithms, to produce their interfaces. Commercial organizations see educational data as a potentially lucrative market, and ‘own’ the systems that are now being used to see, know and make sense of education and learning processes. Many of their systems are proprietorial, and are wrapped in intellectual property and patents which makes it impossible for other parties to understand how they are collecting data, what analyses they are conducting, or how robust their big data samples are. Despite claims to exhaustivity, big data can still only ever be a sample based on users of a platform or a system, not a true census of total populations, especially in education where access to the technologies required for big data collection purposes is highly uneven.
Specific commercial and political ambitions may also be animating the development of educational data analytics platforms, particularly those associated with Silicon Valley where ed-tech funding for data-driven applications is soaring and tech entrepreneurs are rapidly developing data-driven educational software and even new institutions. In this sense, we need to ask critical questions about how educational big data are made, analysed and circulated within specific social, disciplinary and institutional contexts that often involve powerful actors with significant economic capital and extensive social networks of support and influence.
Is a new “big data divide” emerging in education?
Not all schools, colleges or universities can necessarily afford to purchase a learning analytics or adaptive software platform — or to partner with platform providers. This risks certain wealthy institutions being able to benefit from real-time insights into learning practices and processes that such analytics afford, while other institutions will remain restricted to the more bureaucratic analysis of temporally discrete assessment events. In other words, a new educational data divide may be emerging where certain institutions will be able to gain a competitive advantage by having access to the insights available from educational data analytics services and platforms. This reflects the wider emergence of a “big data divide” that Mark Andrejevic has described as a separation between the “hands of the few who use it to sort, manage, and manipulate,” and those “without access to the database who are left with the ‘poor person’s’ strategies for cutting through the clutter: gut instinct, affective response, and ‘thin- slicing’ (making a snap decision based on a tiny fraction of the evidence).” To what extent might a new big data divide in education reinforce and reproduce existing forms of advantage and disadvantage, and exacerbate existing regimes of comparison, competition, and public ranking of institutions?
Can educational big data provide a real-time alternative to bureaucratic policymaking?
Policy makers in recent years have depended on large-scale assessment data to help inform decision-making and drive reform. Educational data mining and analytics can provide a real-time stream of data about learners’ progress, as well as automated real-time personalization of learning content appropriate to each individual learner. To some extent this changes the speed and scale of educational change by removing the need for cumbersome assessment and country comparison techniques that have tended to underpin policy intervention in recent years. But, it potentially places commercial organizations such as the global education business Pearson in a powerful new role in education, with the capacity to predict outcomes and shape educational practices at timescales that government intervention cannot match. Though standardized testing and country comparison has become widely critiqued as a mode of governance in education, the emerging alternative of real-time analytics raises questions about for-profit influence and the privatization of public education by tight networks of corporate education reformers.
Is there algorithmic accountability to educational analytics?
Learning analytics is focused on the optimization of learning and one of its main claims is the early identification of students at-risk of failure. What happens if, despite being enrolled on a learning analytics system that has personalized the learning experience for the individual, that individual still fails? Will the teacher and institution be accountable, or can the machine learning algorithms (and the platform organizations that designed them) be held accountable for their failure? Simon Buckingham Shum has written about the need to address algorithmic accountability in the learning analytics field, and noted that “making the algorithms underpinning analytics intelligible” is one way of at least making them more transparent and less opaque. Significant questions remain however about fairness and equal treatment in relation to big data-based education, and particularly about where accountability lies when algorithmic data-processing systems narrow an individuals’ opportunities in the name of “personalized” learning.
Is student data replacing student voice?
Data are sometimes said to “speak for themselves,” but education has a long history of encouraging learners to speak for themselves too. Is the history of student voice initiatives being overwritten by the potential of student data, which proposes a more reliable, accurate, objective and impartial view of the individual’s learning process unencumbered by personal bias? Or can student data become the basis for a data-dialogic form of student voice, one in which teachers and their students are able to develop meaningful and caring relationships through mutual understanding and discussion of student data?
Do teachers need “data literacy”?
Many teachers and school leaders possess little detailed understanding of the data systems that they are using, or required to use. As glossy educational technologies like ClassDojo are taken up enthusiastically by millions of teachers worldwide, might it be useful to ensure that teachers can ask important questions about data ethics, data privacy, data protection, and be able to engage with educational data in an informed way? Despite calls in the US to ensure that data literacy become the focus for teachers’ pre-service training, there appears little sign that the provision of data literacy education for educational practitioners is being developed in the UK.
What ethical frameworks are required for educational big data analysis and data science studies?
The Council for Big Data, Ethics and Society recently published a white paper detailing many of the ethical implications of big data. It raised important issues and recommendations about the need for informed consent when collecting data from users of platforms, and called in particular for new “social, structural, and technical mechanisms to assess the ethical implications of a system throughout the entire development and analysis lifecycle.” The UK government recently published an ethical framework for policymakers for use when planning data science projects. Similar ethical frameworks to guide the design of educational big data platforms and education data science projects are necessary. New kinds of privacy frameworks and considerations of rights in relation to educational big data also need to be considered and developed, drawing not least on existing considerations of the potential privacy harms associated with data collection, processing, and dissemination.
This list of questions is of course not exhaustive, but helps I think to identify some of the key issues emerging as big data, analytics, algorithms and machine learning processes integrate into educational institutions and practices.
Banner image credit: Torkild Retvedt