How do you measure what is going on inside a person’s mind?
Designing assessments is one of the most challenging pieces of education (and vast numbers of classroom teachers are required to do it on their own). There are numerous obstacles on the path to assessments that tell us what students actually understand.
If the assessment task is too simple, it becomes hard to distinguish between understanding and mimicry. We can teach a barely verbal infant, a parrot, and an AI bot to correctly complete the phrase “two plus two equals…” but that doesn’t mean they actually understand what the phrase means.
Even assessments of more complex learning can result in regurgitation rather than actual comprehension. Every student has been in at least one class that could be passed by simply spitting back to the teacher what she had delivered to the class.
The clearer a teacher’s expectations are, the easier it can be to simply game the system. But if the teacher’s expectations are not clear, she will have trouble deciding whether the students failed to comprehend the material or to understand what the assessment requires. Finding the sweet spot is difficult.
More complex assessments also run into the question of how much of the students’ troubles are a failure in understanding and how much are trouble in conveying that understanding. If you have asked for an essay answer, did the student come up short because of a lack of understanding, or because of a lack of writing skill? Or did the question itself include some cultural bias that interfered with student understanding of the task?
The common assumption is that if the student really knows the material, she’ll have no trouble expressing her understanding. But how many adults have had the “I know what I want to say, but I can’t quite figure out how to say it so you’ll understand” experience? Many persons have trouble translating understanding into communication, and many other persons can communicate quite effectively about things they do not understand at all.
Add to that the common insistence that students complete a test within limited time and with no assistance, parameters that are often ignored with problem-solving in the real world. Tests are meant to measure the messy insides of a human mind with neat and orderly units.
Most importantly, we have no way to check if an assessment gives us a perfect understanding of student achievement, because we have no perfect measurement to which it can be compared.
At best we can check validity and reliability. Validity checks to see if we’re measuring what we mean to measure; are we using a ruler or scales to measure the weight of an object? Is the assessment task closely related to that which we want to assess? Reliability checks to see if the test gets consistent results; does the ruler find the same length every time we use it to measure the same object?
When the PARCC tests were launched to assess student achievement of the Common Core standards, a common criticism was that they were not valid tests. Value-added evaluations of teachers based on those test scores turned out to be unreliable, varying from year to year.
For classroom teachers, all of these issues should lead to a constant year-to-year fine tuning of assessments. But for the state’s Big Standardized Tests or tests like the SAT and ACT, tweaking has to be done very carefully, because part of the tests’ selling point is that the results can be compared year to year. States have the additional challenge of setting cut scores; where do they draw the line between proficient and just good enough? That’s a process that may have as much to do with politics as with education.
Witness the current mini-firestorm set off by the College Board “recalibration” of Advanced Placement test scores, which has dramatically increased the number of students with high scores on the tests. Is it grade inflation, or is it “matching the reality” of the grading in college course for which AP courses are supposed to count? As several commenters have noted, it’s a threat to the College Board’s marketing to let the mask slip, to admit that the tests are not some objective perfect measure of student achievement.
The answer, in part, is that the AP tests are, like every other test given to students, an imperfect measure of student achievement that is made up by human beings and administered under artificial circumstances.
None of this means that students should never be assessed. As G. K. Chesterton wrote, anything worth is doing is worth doing badly. We have plenty of testing professionals and experienced educators who can help us get closer to the mark. But it’s important to remember that assessments are not handed by via burning bush, perfectly accurate and objective. This is why wise teachers use many and varied assessments, and don’t depend on a single test to perfectly measure the contents of a young human mind.
Read More