Abstract

I might be wrong, but for the next decade or so I don't foresee any revolution or major change of paradigm. Instead, I expect further technical sophistication, both of our research and applications. More of the same thus, but introduced at a much faster speed and higher level of complexity. Am I happy with this? Yes and no. I'm certainly supportive of technical perfection and will keep trying to contribute to it. But the real danger exists in the neglect of our conceptual basis [emphasis added]. I'm sometimes shocked by the blind applications of techniques I see. The NCME annual meetings have definitely shown a trend toward more papers with minor technical tweaks. The same holds for our journal articles. If you ask their authors for a motivation, you hardly get an answer. Some of our research areas, I believe, would definitely benefit from a conceptual cleanup. We need technical progress to find solutions to practical problems. But each good solution begins with a correct conceptualization. I have had the same reaction at many recent NCME conferences when it appears to me that some of my fellow psychometricians see themselves as little more than glorified “test technicians.” A test technician knows exactly how to apply textbook recipes for the purpose of maximizing information functions, estimating linking constants and conducting DIF analyses. We need people who can do these things, to be sure. But I worry that we do not have enough psychometricians who are capable of also wearing the hat of an “assessment engineer” (a term I am borrowing from Ric Luecht). For an assessment engineer, the front-end design of an assessment is just as important as the back-end analysis of item response data. In fact, without clarity about the aims of the front-end design, the back-end analysis will often be misguided. Being a capable engineer means that you can see the big picture about the intended interpretations and uses of an assessment system. It means being able to specify claims and assertions about test scores and their transformations that are testable falsifiable via empirical analysis. Yet it is impossible to be more than a technician if you are incapable of being meta-cognitive about why you have decided to apply a technical solution to a given testing problem in the first place. The conceptual basis for psychometric research is not something that can easily be picked up while reading chapters in a textbook as a graduate student and then stored away for future reference. Indeed, there is no book out there that will give you straightforward answers to the questions above, because lurking below the surface of these questions are some fundamental disagreements about psychometrics and educational measurement. I don't find that troubling at all; what worries me is that possibility that a preponderance of psychometricians in the NCME community think that these disagreements have all been settled. An even greater worry is that there are many who are unaware that the disagreements exist at all. In looking forward to the 100 year anniversary of NCME in 2038 it is important that we take the time to appreciate that many of the psychometric issues that seem so new (e.g., assessments for the “next generation”) are, at heart, variants of the same basic issues that NCME members have been debating for decades. The computers have gotten faster, the software more sophisticated, and theories of learning are more elaborate, but in the end we're all still trying to figure out the best way to characterize and draw inferences about the interaction between an assessor and assessee, a student and a test item, an observed score and a latent variable. And those who refuse to learn from the lessons of the past are doomed to repeat it. I liken the cover of this issue to an episode of the TV show called “What Not to Wear.” The top panel recreates the way that I often see state-level trend data displayed when I participate in technical advisory committee meetings. (Data come from a real state, but let's keep it anonymous.) These grouped bar plots are intended to convey two things: aggregate trends in reading proficiency and differences in these trends among white and Hispanic students. I much prefer the use of a line plot to communicate this information, as shown in the bottom panel of the cover graphic. Do you agree? Let me know: [email protected]. Of course, even if you put lipstick on a pig, it's still a pig: the plots show little evidence that successive cohorts of white or Hispanic 10th grade students are showing signs of improvement in reading, and the gap between the two groups has shown little signs of narrowing.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call