SINCE I TEACH assessment classes at the university and write this Technology column, it makes sense that I should write a column on the intersection of these two topics. I wonder why I didn't think about doing such a column before? Before I get to the intersection of assessment and technology, though, I want to discuss a few assessment fundamentals and clarify a few terms. Assessment as a discipline has always been concerned with both measurement and evaluation. Measurement is the easy part; evaluation is a bit tricky. Before you evaluate a student, school, or district, you have to consider the compared to what issue, and therein lie the tricks. Classically, assessment texts have described and criterion-referenced evaluation. In norm-referenced evaluation, you compare a student to a representative sample of similar students across the U.S., which is known as the norm group. Such tests as the CTBS (Comprehensive Tests of Basic Skills) and SAT (Stanford Achievement Tests) are examples of tests designed for this purpose. Criterion- referenced evaluation compares a student to a set of objectives, competencies, or standards -- usually state standards measured by state tests. Unfortunately, most assessment texts forget to mention the third approach to evaluation, evaluation. To use improvement-referenced evaluation, you have to accurately track a student's progress and take measurements at least three times a year. Most tests designed to provide information for norm- and criterion- referenced evaluation do not work well for improvement-referenced uses, since, for a variety of reasons, they cannot be given reliably -- at least in a paper-and-pencil format -- three times a year. One of the biggest issues today is grade-level testing. On the surface, it seems logical to give fourth-graders a fourth-grade test. The problem is that some fourth-graders are performing at the second- or third-grade level, and giving them a fourth-grade test yields little useful information. Besides, these below-grade-level students get frustrated by tests that are much too hard for them. Grade-level testing is also frustrating for students performing above grade level. This issue prompts people to advocate or discuss OOLT (Out Of grade- Level Testing). Why not give students performing above or below grade level the tests that are appropriate for students at their level? The obvious solution to this quandary is to get the computer to custom design tests to fit individuals. This is called Computer Adaptive Testing (CAT), and such tests fit nicely with an improvement-referenced approach to evaluation. A CAT test is simply a test that makes continuous adjustments in the difficulty of items so that they match a student's performance level. If a student misses an item, a slightly easier one is given. If a student gets an item correct, a slightly more difficult one is given. Since time is not wasted on items or questions that are above or below a student's ability, relatively few items need to be answered. Testing often takes as little as 10 minutes. Obviously, a computer is necessary to quickly check an item and offer the next one, and a large item bank of questions matched to various levels is required to support such an approach. Computer adaptive tests have numerous advantages. First, every student receives a unique test, adjusted to his or her performance level. This makes cheating virtually impossible. Second, test results can be immediately obtained, and a wide variety of reports can be generated. Third, a CAT test can be administered one-on-one in a classroom setting or to many students at once in a computer lab. Fourth, with a large item pool, a CAT test can be given regularly -- for example, in August, January, and May. When a CAT test is given regularly, individual student progress can be charted and evaluated. That is, you do not have to wait until you give the state achievement test in late spring to find out if a student is making progress. …
Read full abstract