AI chatbots will be a game-changer for educational assessment

Dr Michael Johnston
24 January, 2023

I hated sitting exams at school and university. I would break out in sweats and get the shakes. Sometimes it was bad enough to make it difficult to write legibly. Despite my exam nerves, though, I got through my undergraduate degree with good enough grades to get into honours in psychology.

Suddenly everything was much easier. Honours was assessed by thesis. I could take my time to gather my thoughts and edit my work to my satisfaction. I graduated second in my year and went on to complete a doctorate.

Exams frequently underestimate the learning of students who suffer from the kind of anxiety that I did. My exam anxiety didn’t, in the end, hold me back. But some students who have learned enough to do well sometimes fall short of a passing grade simply because they are overcome by terror in the exam room.

In my time as a university lecturer, I tried to avoid assessing my courses by exam. Recalling my own bad experiences, I had no wish to inflict them on my students.

I wasn’t the only one. There has been a general move away from exams at both school and university levels. Coursework assessment has become much more commonplace. This has, in part, been in recognition that some students (like me) suffer from significant exam anxiety.

Recently, though, I have rethought my position on exams somewhat.

For one thing, evidence from the grading for coursework assessment in NCEA shows non-exam assessment isn’t working very well. Over the past ten years or so, average grades for coursework assessment have got higher and higher. There are some assessments in which Excellence is the most common grade. Over the same time period, grades for the external, exam-based assessments have remained similar.

Exams for NCEA are marked by small panels convened by NZQA. There are rigorous processes in place to ensure that marking is consistent from year to year. Unfortunately, there’s no practical way to apply those processes to internal assessment.

Unsurprisingly, students – and schools – prefer coursework assessment to exam-based assessment. Every year, greater and greater proportions of NCEA results come from coursework assessments.

Now, however, there’s a new threat to the integrity of coursework assessment: Artificially intelligent chatbots.

One of the coursework assessments for NCEA English requires candidates to write a short story about a difficult decision. I thought I’d see how well ChatGPT, a new online chatbot produced OpenAI, would do with this task. I simply typed “write a short story about a difficult decision” into its command prompt.

Here’s how its response began:

“Once upon a time, there was a young man named Jack. He had always been a hard worker and had just landed his dream job at a successful company. However, as he began to work, he realized that the company was not as ethical as he had thought. They were cutting corners and exploiting their workers.”

The story goes on to describe how Jack is faced with an ethical dilemma: To speak out against the exploitative behaviour and risk his job, or to keep his mouth shut and protect himself. In the end, he decides to speak out. As a result, he loses his job, but he maintains his integrity and feels proud. He goes on to find a new job at a better company.

ChatGPT’s writing is far better than much that I’ve seen from undergraduate university students. The story isn’t exactly inspired, but it’s well structured. It would certainly attract a passing grade in NCEA.

I tried two other experiments. ChatGPT came up with a cogent essay arguing the relative merits of capitalism and communism, and an accurate account of the difference between mass and weight. I concluded that ChatGPT would likely pass NCEA with excellence

Online tools like ChatGPT are going to give NZQA, and universities, a real headache. It will become very difficult indeed to prevent students from using these tools to cheat in coursework assessments.

One effect is likely to be a return to exams as a preferred means of assessment. Ideally though, we won’t see a wholesale return to the time-limits and other rigid conditions that characterised exams in the past.

The methods of assessment used in education should be as valid as possible. A valid assessment is one that results in a sound estimate of a student’s capabilities, and that is as uncontaminated as possible by irrelevant factors. Exam anxiety is one such irrelevant factor.

We should also think about how different methods of assessment drive different student behaviour. Preparing for exams typically involves ‘cramming’, meaning that students rote learn information and regurgitate it in the exam room. (Often the information they regurgitate has little to do with the questions they’re actually being asked.)

Rote learning has its place. In every subject there is information that students should just know cold. Doctors need to know the anatomy of the human body, lawyers need to know a raft of case law, and historians need to know about key events in the past.

But, while knowledge is very important, there’s more to a good education than learning raw facts. Being able to use knowledge productively is important too. Time-limited exams are not a very valid format for anything that requires insight, because insight doesn’t occur on cue.

I think NZQA would do well to explore a hybrid between exam-based assessment and its current approach to coursework assessment.

Students could be required complete all assessments in secure conditions, without access to the internet. That would prevent them from consulting ChatGPT.

NZQA-convened marking panels should be responsible for marking all NCEA assessments. That would take care of the grade-inflation problem.

The issues of exam anxiety and the fact that insight doesn’t occur on command could be addressed by relaxing time restrictions on completing exams. Not having time pressure would have made me far less nervous. Having plenty of time would also allow for more valid assessment of tasks requiring insight or creativity.

Speaking of insight and creativity, that’s where ChatGPT falls down. It works by analysing associations between the countless millions of facts it harvests from the internet. It’s an amazing tool, but it can’t think up anything new. Any semblance of creativity in its output is an illusion.

Part of the solution to the problem posed by AI chatbots, then, might be to refocus assessment on innovative application of knowledge rather than just on memorising it.

Read more

Stay in the loop: Subscribe to updates