Big Data and Education Data Mining

Across many professional areas there has been a surge of interest in Big Data, with systems being developed to use digital data to improve everything from commerce to crime fighting. But what's the relevance of Big Data in education?

There has been concern recently that companies selling software to schools might use the data on students for commercial purposes, such as targeting of advertisements. These are legitimate concerns and legislation is being put in place to prevent such uses.

The kind of educational data mining talked about in this article is, however, for a different purpose - for improving the value of education for students by enabling schools to better understand the patterns in their data and use them to respond to students' learning needs.

Schools, like other sectors of society, are collecting and storing increasing amounts of digital information. Typical sources are the annual reports from national assessment schemes such as NAPLAN, assessments teachers conduct themselves, or systems that the school has purchased. These kinds of data provide a snapshot in time that might tell us where a student or a group of students are in their learning.

Still, there are new types of information becoming available.

As schools use more digital interactive learning environments, more data becomes available on learners. This data is available in real time as learners are engaged with the learning tasks. But, how do you make sense of this kind of data?

Education Data Mining (EDM)

Educational Data Mining (EDM) grew out of the field of research that is designing intelligent learning environments in which the computer-based system traces what a learner has or has not mastered and intervenes with help as needed. In order to do this, the intelligent system needs to have algorithms and models to implement that process.

That's where EDM comes in. Generally, the goal of EDM is to look for new patterns in data and develop new algorithms and statistical models that can be applied in digital learning.

It is used to analyse data to check on learning theories and to refine the design of the learning systems themselves. It generally emphasises reducing learning into small components that can be analysed and then utilised by software that adapts to the student.

The kinds of data that EDM is applied to vary in their levels of granularity. It may be applied to detailed data acquired from individual keystrokes or mouse clicks. It may be applied to medium level data such as students' responses to tasks, or, on a larger scale, the session level, the student level and even the classroom level, teacher level, and school level.

Obviously, these data are nested inside one another, which can complicate analysis. They are also arrayed over time and sequence.

Two prominent EDM researchers, Ryan Baker at Teacher's College Columbia University in the US and Kalina Yacef of the University of Sydney (2009), characterise the goals of EDM research as:

  1. Predicting students' future learning behaviour by creating student models that incorporate detailed information such as students' knowledge, motivation, metacognition, and attitudes;
  2. Discovering or improving domain models that characterise the content to be learned and optimal instructional sequences;
  3. Studying the effects of different kinds of pedagogical support that can be provided by learning software; and
  4. Advancing scientific knowledge about learning and learners through building computational models that incorporate models of the student, the domain, and the software's pedagogy.

So, next time you hear the words Big Data, think of the data scientists that are using Educational Data Mining to improve how intelligent learning environments help students to learn.


Baker, R. S. J. D., and Yacef, K. (2009). ‘The State of Educational Data Mining in 2009: A Review and Future Visions.' Journal of Educational Data Mining 1(1): 3–17.

What do you think of when you hear the words Educational Data Mining?

What data are you using to inform student learning?