Education Research Current Organisation and Cooperation NL
Login as
Prospective student Student Employee
Bachelor Master VU for Professionals
Exchange programme VU Amsterdam Summer School Honours programme VU-NT2 Semester in Amsterdam
PhD at VU Amsterdam Research highlights Prizes and distinctions
Research institutes Our scientists Research Impact Support Portal Creating impact
News Events calendar Energy in transition
Israël and Palestinian regions Women at the top Culture on campus
Practical matters Mission and core values Entrepreneurship on VU Campus
Organisation Partnerships Alumni University Library Working at VU Amsterdam
Sorry! De informatie die je zoekt, is enkel beschikbaar in het Engels.
This programme is saved in My Study Choice.
Something went wrong with processing the request.
Something went wrong with processing the request.

How to deal with ‘bad’ questions in a multiple-choice test?

Back to the didactic tips overview page
Last updated on 12 August 2024
As an examiner, you sometimes face the challenge of how to deal with ‘bad’ questions in multiple-choice tests. Do you remove them? Do you calculate an additional alternative correctly? Or do you count them entirely correctly? And how do you deal with the failing grade limit? These five steps will help you tackle this systematically.

Basic principles of a good test
The basic premise of tests is that they are valid and reliable. Roughly speaking, validity means that the test covers the learning objectives of the course. A test matrix can help you determine how many questions to include on which topic so that you ensure coverage. Reliability means that students' scores are not too much due to chance. To achieve this, the test should contain enough questions that can best distinguish who has or has not mastered the material or skills. You can't easily recognize ‘bad’ questions, even if you think you are following all the construction rules for test questions. You can't tell until you conduct a test and question analysis of students' answers and scores. Here's a step-by-step guide on how to do that. 

Step 1: conduct a test and question analysis 
After administering a test, conduct a test and question analysis. First determine whether the test is sufficiently reliable (cronbach alpha > 0.7). If the reliability is sufficient, further analysis of the questions is somewhat less important. Is reliability low? Then examine the quality of the individual questions such as difficulty (p-value) and distinctiveness (rit value). Keep in mind: if you have very few questions beforehand, reliability is probably low, and you cannot improve it. Try to make more questions the next time you design and administer the test. 

Step 2: identify questions that incorrectly measure proficiency 
Some questions misidentify a student's level of knowledge or skill. This can be due to unclear wording on the one hand or even key errors (scoring the correct answer wrong or vice versa) on the other. With unclear wording, students often guess the answer which results in somewhat random answers and a low question-test correlation (rit<0.2). This is often detrimental to all students' scores. You may therefore decide to score the answers to this question as correct for all students or remove them entirely.  

Key errors are more problematic. You can often identify these questions because the number of students who answer these questions correctly is below the probability score (i.e., for four-choice questions, below 25%, or p<0.25 and the question-test correlation negative: rit<0). You can fix these questions and then recalculate the scores. In order not to penalize students, the advice is to keep counting the correctly calculated, but incorrect, answers. Removing the question is not necessary. If the score is below chance level, but the rit value is greater than 0, then it may simply be difficult material, in which case the question is good. 

Step 3: identify questions that barely measure skill 
Good questions in a test optimally identify the degree of knowledge or skills students have. This works best if the questions are neither too easy nor too difficult (0.3>p<0.8). Indeed, questions that almost every student answers correctly (p-value near 1) make little distinction. This means they do not distinguish proficient or less proficient students well. At the same time, not all easy questions are irrelevant because the content of the questions may in fact matter. First, to cover the learning objectives and second, it may also simply be true that all students studied very well and almost everyone gives the correct answer. For a follow-up test, ideally devise one or two more attractive distractors that less proficient students might choose.  

Step 4: review the pass-fail score only if necessary 
If you remove questions, you must redetermine the pass-fail score. Conduct the review primarily to prevent unfairly failed students. A rule of thumb therefore is that at least no more students fail than without adjusting the key or removing the questions. If you decide to make a question completely correct for everyone, you can leave the pass-fail score as you determined beforehand. Afterwards, you don't need to perform test and item analysis with the new data because the original values found are most valuable. You can use the next test to realize improvements. 

Want to know more? 
Read the Quickstart Test Development (pdf) for more details 

Quick links

Homepage Culture on campus VU Sports Centre Dashboard

Study

Academic calendar Study guide Timetable Canvas

Featured

VUfonds VU Magazine Ad Valvas

About VU

Contact us Working at VU Amsterdam Faculties Divisions
Privacy Disclaimer Veiligheid Webcolofon Cookies Webarchief

Copyright © 2024 - Vrije Universiteit Amsterdam