Trending Topics

Discriminatory item analysis: A test administrator’s best friend

Analyzing test question performance can help EMS educators identify where students are missing concepts and to remove the emotional burden from grading

GettyImages-1148585703.jpg

There is more to an exam than the letter grade participants receive.

Photo/Getty Images

Have you ever had an EMS student who complained about a test question you created? Are you looking for a better way to assess test question validity? What if you could determine the validity of a test question without looking at the question? This article will illustrate how discriminatory item analysis allows instructors to use data, rather than feelings, to validate test questions.

Common test question fallacies

Fallacies:

  • If most of the class missed an item, it must be a difficult question.
  • If most of the class answered an item correctly, it must be an easy question.

Without applying discriminatory item analysis, it is dangerous to jump to these conclusions. There are many situations where most of the class missed an item but it is statistically valid. There are also times when most of the class answered an item correctly, but it is statistically invalid.

Fallacies:

  • It takes multiple administrations of an item to validate a question.
  • It is difficult to validate an item the first time it is administered.

Discriminatory item analysis can be applied after the first administration of an item without compromising validity of the data. As a side note, the more times an item is administered, the cumulative data may help strengthen its validity.

calculating.jpg

Common test writing definitions

Before delving into discriminatory item analysis, let us define the elements commonly associated with test question construction.

  • Item and/or test item. All the components of a test question
  • Stem. The test question
  • Distractor. The wrong answers within a multiple-choice question
  • Key. The correct answer within a multiple-choice question
  • Item analysis. The statistical methods used to assess a test item/question

Calculating item difficulty

Item difficulty is the basic indicator for determining the difficulty level of an item. Item difficulty is calculated by dividing the number of people who attempted to answer the item, by the number of people who answered correctly. For example, if 78 people answered the item out of 100 people who attempted to answer the item, it has a .78 difficulty.

The following table illustrates how the index ranges correlate with the degree of item difficulty:

screen-shot-2020-08-05-at-125736-pm.jpg

The goal is to strive for item difficulty that falls between 0.40-0.60 for most questions within an exam. It is important to recognize item difficulty without applying discriminatory item analysis can be fuzzy for the following reasons:

  • The item might actually be difficult or easy
  • Might be a poorly written stem
  • Might be a convoluted question that is complex, and/or difficult to follow
  • Might be poorly written distractors

97674191_10158586339705962_1666172188127920128_n-1.jpg

Listen for more

The evolution of community paramedic education

SDFD’s Anne Jensen and REMSA’s Adam Heinz share tips on MIH/CP training, time scales and starting up


Demographic groups

Discriminatory item analysis starts by breaking all the participants who attempted the exam into three demographic groups based upon their test scores:

  • Upper percentile group. This group should be equal to, or as close to 27% of the participants who took the exam and have the highest test scores.
  • Lower percentile group. This group should be equal to, or as close to the number of participants in the upper percentile group and have the lowest test scores. If distribution is uneven, the larger group of students should be weighted toward the upper percentile group.
  • Middle percentile group. This group rounds out the remaining participants with test scores that fall between the upper and lower percentile groups.

Discriminatory item analysis

Discriminatory item analysis refers to how well the data can differentiate between high and low performers for a given test. The general premise behind discriminatory item analysis is students with the higher test scores probably understand the educational concepts better than students with lower scores.

Implementing discriminatory item analysis will help test administrators eliminate much of the fuzziness associated with utilizing item difficulty alone. Once participant scores have been broken up into their three respective groupings, you are ready to analyze the data.

Statistically valid items

It is reasonable to conclude that students who score highest on a test probably understand the concepts better than students who score poorly. Since this is the case, students with the highest test scores should be answering questions correctly, and students with the lowest scores should be answering questions incorrectly. Statistical validity occurs whenever more participants within the upper percentile group correctly answer an item when compared to the lower percentile group.

Statistically flat items

Statistically flat items occur when equal numbers of the upper and lower percentile groups missed the item. As a rule of thumb, this becomes an issue when greater than 20% of the participants miss the item. This is a problem because a statistically flat item does not provide useful data between high and low performers.

Statistically invalid items

Analyzing data for statistically invalid items is a little more involved process. A test item is considered statistically invalid in these circumstances:

  • Any question where the upper percentile group misses more than the lower percentile group. There should never be a time when the lower performers score better than the high performers.
  • When 50% or greater of the participants missed the question, with 50% or greater of the upper percentile group missing the question. This is referred to as the 50/50 rule. Just because more than half of the participants missed an item, doesn’t mean it is a bad question. What makes it bad, is more than half of the high performers missing the item. Unless you intended the question to be very difficult, you should never expect more than half of your high performers to miss an item.
  • Any statistically flat question where greater than 20% of the participants missed the item. An item should discriminate between high and low performers. If this discrimination doesn’t occur, the item needs to be reworked so it does discriminate between high and low performers.

Items warranting closer review

There are some items that may have issues with their construction and warrant a closer review. These items tend to fall within these circumstances:

  • Statistically flat questions where less than 20% of the participants missed the question
  • Questions where greater than 50% of the participants missed the question
  • Difficult questions, that are statistically validated – questions where greater than 30% of the participants missed the question should be categorized as a difficult question

More than a grade

There is more to an exam than the letter grade participants receive. Discriminatory item analysis will help identify students who truly understand the information. It also helps direct remediation by identifying specifically where lower performing students are missing concepts. Most importantly, discriminatory item analysis will help instructors minimize the emotional burden associated with grading, and defending the items associated with their exam.

Read next: Using pre-tests in the online EMS classroom

References

  1. Baker, Frank (2001). The Basics of Item Response Theory. ERIC Clearinghouse on Assessment and Evaluation, University of Maryland, College Park, MD.
  2. de Boeck, Paul; Wilson, Mark (2004) Explanatory Item Response Models: A Generalized Linear and Nonlinear Approach. Springer. ISBN 978-0-387-40275-8.
  3. Embretson, Susan E.; Reise, Steven P. (2000). Item Response Theory for Psychologists. Psychology Press. ISBN 978-0-8058-2819-1.
  4. Guilford, J. P. (1936). Psychometric methods. New York: McGraw-Hill
  5. Kline, P. (1986). A handbook of test construction. London: Methuen

This article was originally posted Aug. 5, 2020. It has been updated.

Bob Matoba, M.Ed., EMT-P is an associate professor at the College of Central Florida in Ocala. Bob’s career has spanned almost every aspect of the EMS profession, first as an EMT and paramedic for private ambulance companies, EMS coordinator for medical oversight, EMS system consultation in the private and public sector, all the way to the EMS chief for a metropolitan fire department. He has made it his mission to educate clinicians, rather than technicians. Bob is a monthly columnist for EMS1.com and has been a featured and contributing author for EMS World Magazine and JEMS.
RECOMMENDED FOR YOU