American students sure do take a lot of tests. According to a study quoted in The Washington Post, some students spend nearly a full 24 hours on testing each year. That doesn’t include the time spent preparing for the test, by the way. There are certainly debates to be had about the appropriateness of such extensive testing, but I want to steer away from the politics. Assuming that this testing is here to stay, at least for the foreseeable future, how can we make the best of it?
Consider the following question: What is the stated purpose of standardized testing? The results of standardized tests can be used to answer many questions, but their basic purpose (that ultimately informs all other applications) is to determine the extent to which a given test taker has mastered a particular concept or skill:
Is test taker A able to demonstrate mastery of skill or concept X?
That’s it. Simple, right?
Well, maybe. But there are a few things that can get in the way.
Take this item for example:
Stem: If a letter is chosen at random from the letters in the word ELEPHANT, what is the probability that the letter chosen will be a vowel?
Correct Answer: (b)
Seems pretty good, but we have to evaluate it against the goal: Is test taker A able to demonstrate mastery of skill or concept X? In this case the stated skill was: Develop a uniform probability model by assigning equal probability to all outcomes, and use the model to determine probabilities of events. The item accomplishes this. Each letter is an outcome, they are each equally likely to happen, and based on that, the student can determine the likelihood of a particular event, in this case choosing a vowel. Good. Done. Right?
Well, not so fast. The item does address the stated benchmark, but that’s not all it addresses. There is another key to being able to unlock this item that has nothing to do with probability or even mathematics: knowing the difference between a vowel and a consonant. Well, that’s OK—everyone knows that, right? But what about English language learners? Or just the kid who missed that day? Test takers could have a perfect grasp of the skill of computing uniform probability but still miss this item because they don’t know what a vowel is (thus needing to guess) or because they confuse a vowel with a consonant (leading them to answer choice d). Because the goal of the item is to test mastery of probability, it would provide faulty results.
We must be sure that the item we write is actually testing the concept we’re trying to measure and nothing else. The argument can certainly be made that in order to write items that address applications, we must include outside material, and I believe that’s true. The point is that we have to be careful about what that outside material is, how accessible it will be to the variety of test takers, and how reasonable it is at the grade level being addressed.
Is testing outside material the only problem with the item? Unfortunately, no. The wording of the stem is ambiguous and leads to difficulty understanding what the possible outcomes are. Are we to assume that each letter of the word “ELEPHANT” is written on a card and placed in the proverbial hat—8 cards, E appears twice—or are we just to look at the letters and choose from among them—7 choices because there are 7 different letters? This ambiguity in the stem makes answer choice (a) a possibly correct answer, but it wouldn’t be scored as correct. So, we could have a student who has reasonably interpreted the item and shown mastery of the concept who would still get the item wrong—all because the item is ambiguously worded.
At this point, the only answer choice that definitely shows misunderstanding of the concept is answer choice (c). I guess we could try to score items to represent that in some way, but it’s much easier if the correct answer (according to the scoring guide) is the only correct answer and that it actually tests the content of the benchmark alone.
Both of these issues make it difficult for test takers to demonstrate the mastery that they do, in fact, possess. There are actually many ways to do this, but there are also ways to write items that allow those who don’t have mastery to appear as if they do.
Here’s an example of a high school level item that does just that:
Lead-in: Read this citation from Charles Montesquieu and answer the question below.
“But constant experience shows us that every man invested with power is apt to abuse it, and to carry his authority as far as it will go.” — Montesquieu, The Spirit of Laws, Book XI
Stem: How did this citation from Montesquieu influence the content of the U.S. Constitution?
a. by observing that the consolidation of power within a single body predisposes abuse, thus encouraging the separation of powers among governmental branches
b. by establishing the principle that authority of the government is created and sustained by the consent of its people
c. by promoting a form of government in which all eligible citizens participate equally
d. by supporting the rebellion of the common people
Correct Answer: (a)
Sounds good, right? Lots of fancy words. A quote from Montesquieu—nice nod to the Common Core. It even seems to address its stated goal: Demonstrate an understanding of the origins and purposes of government, law, and the American political system.
OK, but do test takers need to know their stuff in order to get it right? Actually, could a sophisticated reader maybe get it right without even looking at the stem?
Here’s the hard truth. Answer choice (a) LOOKS like the right answer. It’s longer than the others, but it’s also much more well developed both grammatically and conceptually. And let’s face it—we tend to put more thought into the right answer.
But let’s say our test taker didn’t notice that rather obvious difference from the other answer choices. How else does this item, just by the way it is written, help the test taker to get the right answer? Or at least to make a better guess? Take a look at answer choice (d). I may not be able to figure out the right answer immediately, but I can easily eliminate (d) because even someone who can’t decode the quote at all would probably know that the Constitution does not include a “revolution” clause. If I can eliminate (d), my chance of guessing correctly just jumped from 25% to 33%, and that better chance has nothing to do with the stated goal of the item. Pretty good. It’s at least enough of a jump to make my score less representative of my actual understanding of the material.
So what’s the solution? Write distractors that are as sophisticated as the correct answer and make sure they are all plausible.
After all, these test results are used to evaluate schools, teachers, students, curricula, and more. Faulty results like the ones that can result from the problems shown here have real, tangible consequences. And if we’re going to do all this testing, its results may as well be reliable.