Using Adaptive Comparative Judgement to Assess Mathematics

Ian Jones & Lara Alcock
Loughborough University

Adaptive Comparative Judgement (ACJ) is a method for assessing evidence of student learning that offers an alternative to marking (Pollitt, 2012). It requires no mark schemes, no item scoring and no aggregation of scores into a final grade. Instead, experts are presented with pairs of student work and asked to decide, based on the evidence before them, who has demonstrated the greatest mathematical proficiency. The outcomes of many such pairings are then used to construct a scaled rank order of students from least to most proficient.

ACJ is based on a well-established psychophysical principle, called the Law of Comparative Judgement (Thurstone, 1927), which states that people are far more reliable when comparing one thing with another than when making absolute judgements. The reliability of comparative judgements means “subjective” expertise can be put at the heart of assessment while achieving the sound psychometrics normally associated with “objective” mark schemes.

Until recently comparative judgement was not viable for educational assessment because it is tedious and inefficient. The complete number of required judgements for producing a rank order of \(n\) scripts is \(\frac{n^2-n}{2}\).However the development of an adaptive algorithm for intelligently pairing scripts as more judgements come in means the number of required judgements has been slashed to around \(6n\).

There are several scenarios where ACJ may be a more suitable method than marking for assessing mathematics. It is well suited to aspects of mathematical education that are intuitive to experts but difficult to specify in a mark scheme, such as “conceptual understanding”. It also offers an attractive approach to peer assessment in which the students themselves compare pairs of their peers’ efforts to explain a mathematical concept.

We recently tested these two applications of ACJ, as implemented by TAG Development’s e-scape system, on a first year module at Loughborough University. Around 200 students sat a written test that was specially designed for the exercise by the course lecturer, as shown in the figure. Their responses were then anonymised and scanned, and the students were allocated 20 comparative judgements each to be completed online in their own time over the course of a week.

To enable us to evaluate the outcomes of the peer assessment two expert groups independently assessed the written tests using ACJ. The Pearson correlation between the two expert groups, which provides a judge-rejudge measure equivalent to a mark-remark measure in traditional assessment, was acceptably high (>.8). The average Pearson correlation between peers and experts, which can be considered a measure of the validity of the peers’ judgements, was also acceptably high (>.6).

These results demonstrate that ACJ performed well as a method of assessment. We assigned student grades based on an expert rank order, but in future we plan to use peer rank orders for assigning summative grades. We are also undertaking further work to help us better understand the judging process itself. First, we are analysing survey and interview data from this and other ACJ studies to understand the cognitive processes involved in judging one piece of work against another. Second, several students reported that they found the judging processes challenging but useful for learning. Student learning is a principal motivation for using peer assessment and in future studies we intend to directly measure the learning gains of judging peers’ work.

Further detail about the initial phase of the study can be found in the Mapping University Mathematics Assessment Practices book, available free at

Pollitt, A. (2012). The method of Adaptive Comparative Judgement. Assessment in Education: Principles, Policy & Practice. DOI: 10.1080/0969594X.2012.665354
Thurstone, L. L. (1927). A law of comparative judgment. Psychological Review, 34, 273–286.

Leave a Reply