Abstract

THERESA J.B. KLINE Psychological Testing: A Practical Approach to Design and Evaluation Thousand Oaks, CA: Sage Publications, 2005, 368 pages (ISBN 1-4129-544-3, US$79.95 Hardcover) Most textbooks on psychological testing cover purposes, history, classical theory, psychometric properties (norms, reliability, validity), test construction, issues such as group differences, heredity and environmental influences, professional and ethical concerns, and a description of the major maximum and typical performance tests in educational, clinical, and industrial settings. Dr. Theresa Kline's book differs in that it emphasizes the practical questions of how to construct tests and evaluate them, with a focus on theory, psychometric properties, and test construction. It fills an important niche. Consistent with its goals, the book is presented sequentially, leading the reader through the logical steps of test construction and evaluation: statistics, construct definition, item writing, required samples for norms (four chapters), classical and modern test theory, reliability and validity (six chapters), and ethical and professional issues and a brief review of selected tests (two chapters). Chapter 1 describes the problem of measurement, gives a brief history of testing, with an emphasis on the U.S., and a review of statistical concepts (especially correlation and regression), prior knowledge of which is assumed. Although Pearson's product moment correlation coefficient is thoroughly described, Spearman's rank order correlation is not, even though it appears later to (p. 186). Levels of measurement are presented, because statistical procedures are related to them. However, it has been argued that level of measurement is less important for statistical analysis than for permissible transformations of raw scores and interpretation of results (Gaito, 1980). Levels of measurement are mentioned subsequently (e.g., under attitude scale construction, Chapter 2), but more information would help the test consumer to judge which transformation to use and to decide if interpretation should reflect ordinal or interval differences. The statistics review is clear, although it is stated that power analysis shows that about 20 cases are sufficient to detect a moderate relationship (correlation) between two variables (p. 11). However, as Kline herself shows (p. 82), 85 cases are required to detect a moderate correlation (r = .30) with alpha at .05 and power at .80. Secondly, although many variables are normally distributed, it should be pointed out (p. 22) that converting a raw score to a standard z score does not in itself require or guarantee normality. The chapter ends with a very good discussion of two important questions: establishing the meaning of the construct and its relationship to other constructs and operationally defining it with test items. Chapter 2 describes how to create items for a test using empirical, theoretical, and rational approaches, and practical advice is given for searching the literature and contacting experts. Although the term dust bowl empiricism might have been explained, a useful set of nine guiding rules is presented (albeit without reasons), and advice is offered for the number of items per construct. Useful information is given on attitude scale construction, but an example of a standardized maximum or typical performance test might have helped. Chapter 3 explains how to decide on the test taker's response to the items. It covers open- and closed-ended questions, continuous responses, ipsative versus normative scales, and statistical problems with difference and change scores. There are details of statistical calculations for closed-ended questions, but it is not always clear which statistics are inferential and which are measures of effect size (pp. 54-55). Otherwise, clear advice is given, particularly for writing distractors to multiple-choice items. Although the section on continuous responses actually covers discrete scales with only one mention of a true continuum (the straight line graphic scale), it contains a useful discussion of rating scales, particularly Likert scales, including the optimal number of scale points and the optimal labelling system. …

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call