To compare the external and concurrent validity of brief, extended, and clinical, replication-type role-plays, the heterosocial performance of reliably low-, moderate-, and high-frequency male undergraduate daters was unobtrusively assessed in an empirically derived criterion situation. Analyses focusing on relationships between the role-play tests and the criterion, differentiations between high- and low-frequency daters, and changes in absolute levels of performance across assessments were conducted on global, specific, and physiological measures of skill and anxiety. There were consistent differences in both the external and concurrent validity of the role-plays, with the replication role-play format offering a potentially valid methodology for sampling behavior in specific situations. Unobtrusive and replication role-play assessments differentiated between the extreme groups on specific and global skill measures. Finally, the brief role-play elicited relatively large increases in heart rate.