How to deal with ‘bad’ questions in a multiple-choice test?

Last updated on 12 August 2024

As an examiner, you sometimes face the challenge of how to deal with ‘bad’ questions in multiple-choice tests. Do you remove them? Do you calculate an additional alternative correctly? Or do you count them entirely correctly? And how do you deal with the failing grade limit? These five steps will help you tackle this systematically.

Basic principles of a good test
The basic premise of tests is that they are valid and reliable. Roughly speaking, validity means that the test covers the learning objectives of the course. A test matrix can help you determine how many questions to include on which topic so that you ensure coverage. Reliability means that students' scores are not too much due to chance. To achieve this, the test should contain enough questions that can best distinguish who has or has not mastered the material or skills. You can't easily recognize ‘bad’ questions, even if you think you are following all the construction rules for test questions. You can't tell until you conduct a test and question analysis of students' answers and scores. Here's a step-by-step guide on how to do that.

Step 1: conduct a test and question analysis
After administering a test, conduct a test and question analysis. First determine whether the test is sufficiently reliable (cronbach alpha > 0.7). If the reliability is sufficient, further analysis of the questions is somewhat less important. Is reliability low? Then examine the quality of the individual questions such as difficulty (p-value) and distinctiveness (rit value). Keep in mind: if you have very few questions beforehand, reliability is probably low, and you cannot improve it. Try to make more questions the next time you design and administer the test.

Step 2: identify questions that incorrectly measure proficiency
Some questions misidentify a student's level of knowledge or skill. This can be due to unclear wording on the one hand or even key errors (scoring the correct answer wrong or vice versa) on the other. With unclear wording, students often guess the answer which results in somewhat random answers and a low question-test correlation (rit<0.2). This is often detrimental to all students' scores. You may therefore decide to score the answers to this question as correct for all students or remove them entirely.

Key errors are more problematic. You can often identify these questions because the number of students who answer these questions correctly is below the probability score (i.e., for four-choice questions, below 25%, or p<0.25 and the question-test correlation negative: rit<0). You can fix these questions and then recalculate the scores. In order not to penalize students, the advice is to keep counting the correctly calculated, but incorrect, answers. Removing the question is not necessary. If the score is below chance level, but the rit value is greater than 0, then it may simply be difficult material, in which case the question is good.

Step 3: identify questions that barely measure skill
Good questions in a test optimally identify the degree of knowledge or skills students have. This works best if the questions are neither too easy nor too difficult (0.3>p<0.8). Indeed, questions that almost every student answers correctly (p-value near 1) make little distinction. This means they do not distinguish proficient or less proficient students well. At the same time, not all easy questions are irrelevant because the content of the questions may in fact matter. First, to cover the learning objectives and second, it may also simply be true that all students studied very well and almost everyone gives the correct answer. For a follow-up test, ideally devise one or two more attractive distractors that less proficient students might choose.

Step 4: review the pass-fail score only if necessary
If you remove questions, you must redetermine the pass-fail score. Conduct the review primarily to prevent unfairly failed students. A rule of thumb therefore is that at least no more students fail than without adjusting the key or removing the questions. If you decide to make a question completely correct for everyone, you can leave the pass-fail score as you determined beforehand. Afterwards, you don't need to perform test and item analysis with the new data because the original values found are most valuable. You can use the next test to realize improvements.

Want to know more?
Read the Quickstart Test Development (pdf) for more details

How to deal with ‘bad’ questions in a multiple-choice test?

Also interesting

Quick links

Study

Featured

About VU