ABSTRACT Introduction: Research to date has supported the use of multiple performance validity tests (PVTs) for determining validity status in clinical settings. However, the implications of including versus excluding patients failing one PVT remains a source of debate, and methodological guidelines for PVT research are lacking. This study evaluated three validity classification approaches (i.e. 0 vs. ≥2, 0–1 vs. ≥2, and 0 vs. ≥1 PVT failures) using three reference standards (i.e. criterion PVT groupings) to recommend approaches best suited to establishing validity groups in PVT research methodology. Method: A mixed clinical sample of 157 patients was administered freestanding (Medical Symptom Validity Test, Dot Counting Test, Test of Memory Malingering, Word Choice Test), and embedded PVTs (Reliable Digit Span, RAVLT Effort Score, Stroop Word Reading, BVMT-R Recognition Discrimination) during outpatient neuropsychological evaluation. Three reference standards (i.e. two freestanding and three embedded PVTs from the above list) were created. Rey 15-Item Test and RAVLT Forced Choice were used solely as outcome measures in addition to two freestanding PVTs not employed in the reference standard. Receiver operating characteristic curve analyses evaluated classification accuracy using the three validity classification approaches for each reference standard. Results: When patients failing only one PVT were excluded or classified as valid, classification accuracy ranged from acceptable to excellent. However, classification accuracy was poor to acceptable when patients failing one PVT were classified as invalid. Sensitivity/specificity across two of the validity classification approaches (0 vs. ≥2; 0–1 vs. ≥2) remained reasonably stable. Conclusions: These results reflect that both inclusion and exclusion of patients failing one PVT are acceptable approaches to PVT research methodology and the choice of method likely depends on the study rationale. However, including such patients in the invalid group yields unacceptably poor classification accuracy across a number of psychometrically robust outcome measures and therefore is not recommended.
Read full abstract