Abstract

Free AccessFace ValidityA Critical but Ignored Component of Scale Construction in Psychological AssessmentMark S. Allen, Davina A. Robson, and Dragos IliescuMark S. AllenMark Allen, School of Psychology and Therapeutic Studies, Faculty of Social and Health Sciences, Leeds Trinity University, Horsforth, Leeds, LS18 5HD, UKm.allen@leedstrinity.ac.uk School of Psychology and Therapeutic Studies, Leeds Trinity University, Horsforth, Leeds, UK Search for more papers by this author, Davina A. RobsonDavina A. Robson, School of Psychology, University of Wollongong, Northfields Avenue, Wollongong, NSW, Australiadar707@uowmail.edu.au School of Psychology, University of Wollongong, Australia Search for more papers by this author, and Dragos IliescuDragos Iliescu, Faculty of Psychology and Educational Sciences, University of Bucharest, Sos. Panduri 90, 050657 Bucharest, Romaniadragos.iliescu@fpse.unibuc.ro Faculty of Psychology and Educational Sciences, University of Bucharest, Romania Search for more papers by this authorPublished Online:May 31, 2023https://doi.org/10.1027/1015-5759/a000777PDF ToolsAdd to favoritesDownload CitationsTrack Citations ShareShare onFacebookTwitterLinkedInReddit SectionsMoreFace validity continues to cause a lot of confusion. Some of this confusion comes from the different meanings that are attached to the term “face validity” across different areas of science. In this editorial, we aim to provide some clarity regarding what face validity is, what it is not, and how we might go about measuring face validity in psychological assessment.Defining Face ValidityThe concept of face validity has been around for a long time and was comprehensively discussed for the first time in a review article that outlined the various definitions of face validity, and how multiple meanings had led to vastly different conclusions about what face validity actually is (Mosier, 1947). It is interesting that 75 years later we continue to face this same problem. Journal articles often quote the following definition:“…the term ‘face validity’ implies that a test which is to be used in a practical situation should, in addition to having pragmatic or statistical validity, appear practical, pertinent and related to the purpose of the test as well; i.e., it should not only be valid, but it should also appear valid. (Mosier, 1947, p. 192, emphasis retained)However, this passage comes from Mosier’s (1947) discussion of only one of four different usages of the term face validity – one in which face validity is sometimes used to mean the appearance of validity. The passage was not intended as a correct or all-encompassing definition of face validity, but rather, a basic description of one of many ways the term had been used by researchers. Mosier (1947) outlined that the term face validity had also been used to mean validity by assumption (the assumption of validity is so strong that statistical validation is unnecessary), validity by definition (the objective of the test is defined relative to the measure), and validity by hypothesis (the test must meet an immediate practical need). Since the publication of this article in 1947, the term face validity is no longer used to refer to validity by definition or hypothesis. However, the term face validity continues to be used (in some areas of science) to mean validity by assumption. Mosier (1947) heavily criticized the idea of validity by assumption – arguing that it is not a legitimate form of validity – whereas his discussion of “appearance of face validity” was referring largely to the idea that a test should appear meaningful to those taking the test (Mosier, 1947). This is still considered a critical component of face validity.If you type “face validity” into a Google search you will be able to find multiple definitions that have some level of overlap, and the scientific literature is not too dissimilar. For example, researchers have considered that face validity is when experts look at a measure and decide whether it is appropriate based on their own observation (Kidder, 1982), that face validity is afforded to measures that have not yet demonstrated “superior” levels of validity (Turner, 1979), that face validity refers to whether a test looks valid to the person taking the test (Bornstein, 1996; Nevo, 1985), that face validity is the suitability of the content of a test as perceived by test-takers (Secolsky, 1987), and that face validity is the perceived accuracy, likability and relevance of a test as perceived by test-takers (Thomas et al., 1992). In psychology, most researchers have taken to understanding face validity as test-takers’ perspectives on the test itself (such as meaningfulness and understandability), whereas in other fields, researchers have often taken to understanding face validity to mean “assumed validity,” or taken the description “appearance of validity” (Mosier, 1947) in the literal sense that the measure only appears valid. Using the term face validity to mean “validity by assumption,” and presenting this as a legitimate form of validity, has (quite correctly) led to considerable criticism in fields such as medicine (Downing, 2006; Royal, 2016).It is noticeable that the criticisms towards face validity (Downing, 2006; Royal, 2016) are criticisms toward validity by assumption. Some have argued that when researchers are using their intuition to decide if a test is valid (validity by assumption), the term face validity should be replaced by hypothesized validity (Nevo, 1985). One of the problems with review articles on face validity is that no real conclusion tends to be reached as to a correct definition (e.g., Mosier, 1947; Turner, 1979), and this has probably contributed to the continued confusion surrounding this concept. Perhaps the most up-to-date definition of face validity is the following:“…face validity is the appropriateness, sensibility, or relevance of the test and its items as they appear to the persons answering the test… More formally, face validity is defined as the degree to which test respondents view the content of a test and its items as relevant to the context in which the test is being administered.” (Holden, 2010, p. 637)This definition includes a critical component of face validity – the perceived relevance of the test to its intended audience – but does not include other critical components such as item ambiguity or ease of answering. A recent exploration of face validity as a concept (Connell et al., 2018) also highlighted the importance of establishing whether the target audience finds items distressing or judgmental (e.g., items such as “I am stupid”, “I am ugly”, in personality assessment; see Holden & Jackson, 1979). From the literature reviewed, we can establish a working definition in which “face validity refers to the clarity, relevance, difficulty, and sensitivity of a test to its intended audience”. Moreover, for a test to have demonstrated evidence of face validity, the target population needs to confirm that items are clear and understandable, are relevant to them, are easy to answer, and are not judgemental, intrusive, or distressing.Content and Face ValidityIt has been noted that people often struggle to distinguish face validity and content validity (Hardesty & Bearden, 2004). Content validity is the degree to which individual items capture the theoretical content domain of a construct (Nunnally & Bernstein, 1994). In other words, a measure is considered content valid if it captures the full range of the construct of interest. Hardesty and Bearden (2004) use the example of a dartboard (representing a construct) to explain content validity. If the darts all land on one half of the dartboard (items are only tapping into one half of a construct) then the darts would not be a good representation of the board as a whole. Given the two (very different) definitions of face and content validity, it might seem strange that researchers often confuse the two. However, the confusion probably arises from the fact that approaches toward assessment are similar. A typical approach to measuring content validity is to provide expert judges with individual items and ask them to evaluate the degree to which each item is representative of a construct’s conceptual definition. For example, in one study developing a new measure of irrational beliefs (Turner et al., 2018), three expert judges (qualified clinicians) were provided a definition of the core components of irrational beliefs and 176 items. They were then asked to rate each item from 1 = poor to 5 = excellent in terms of the extent to which each item assesses one of the core components of irrational beliefs. Items that did not match well were considered to have poor content validity and were removed from the item pool.Because some studies also ask expert judges to generally evaluate the quality of items, it is sometimes thought that tests of content validity, by extension, tap into components of face validity (and therefore render additional tests of face validity unnecessary). However, face validity is about whether the intended audience finds items understandable, relevant, and easy to answer (or distressing/intrusive/judgmental). In other words, the term face validity should be avoided when the rating is done by experts (Nevo, 1985). Expert judges might be able to provide a reasonable estimation about whether (for example) adolescents find a new measure of adolescent sexual health distressing or judgmental, but a better approach is to ask a sample of adolescents directly. Moreover, judgment regarding how well an item covers content (by being consistent with a definition) should not be confounded with a judgment regarding how understandable, relevant, or distressing an item is. Items can cover content very well but still be worded in such a way that test-takers find unclear, intrusive, or judgmental. In short, establishing both content and face validity is essential to scale construction and should form the foundation of the development of any new psychological measure.Methods of Assessing Face ValidityFace validity can be established using qualitative or quantitative methods. Qualitative approaches typically involve one-to-one interviews or focus groups whereby participants are provided individual items and asked to provide their thoughts on each one. For example, one study sought to test the face validity of the recovering quality of life questionnaire (ReQoL) that was developed to be suitable for mental health service users over the age of 16 (Connell et al., 2018). In a sample of 76 adults (age 16+) with a mental health diagnosis, 55 participated in individual interviews, 11 attended two focus groups (of seven and four participants), and 6 interviews took place with two participants together. In each case, items from the ReQoL were provided one at a time, and participants were asked whether it was meaningful and relevant to their quality of life, whether it was clear, understandable, and easy to answer, and the reason they either liked or disliked an item or preferred it to another. Thematic analysis was then used to establish reasons for participants liking or disliking particular items (Connell et al., 2018). Qualitative approaches are valuable because they provide a rich source of information and can tap into unexpected issues associated with individual items.Quantitative approaches to establishing face validity differ quite considerably in terms of both sample size and level of detail in assessment. For example, in one study of work-related stress, a pilot sample of 7 participants completed a 21-item questionnaire and were asked to leave notes, either written or oral, concerning the items (Frantz & Holmgren, 2019). Face validity was established by all 7 participants confirming that questions were generally easy to answer and relevant to work stress. In another study developing a new measure of attribution (Greenlees et al., 2005), 20 participants completed a 16-item questionnaire and rated the degree to which they understood each item from 1 = not at all understandable/readable to 5 = very understandable/readable. The criteria used to discard an item was a score of 1 or 2 from any participant (all items scored 3–5 by all participants and all items were retained). Finally, in validating an established 9-item measure of rejection sensitivity (Mishra & Allen, 2023), 43 participants rated each of the nine items on five components of face validity. Participants rated (on a 5-point scale) whether they considered the item relevant to them, clear, easy to answer, distressing, or judgmental. Face validity was established by each item (and the mean of the 9 items) averaging below the midpoint of the scale. There is little guidance on cut-off values for retaining items in tests of face validity (Hardesty & Bearden, 2004), and establishing cut-off conventions is an important next step in face validity theory development.The nature of the test itself might contribute to whether researchers adopt qualitative or quantitative approaches when assessing face validity. For example, some psychological measures might be perceived to have a performance component (e.g., personality assessments in the workplace), and if test-takers perceive they have done poorly on the test itself then they might be more inclined to rate the test as poor (self-serving bias; see Miller & Ross, 1975). In such instances, qualitative approaches might be more beneficial to help establish the underlying reasons given for poor evaluations of a test. It is also important to acknowledge that a test is more than just a test (Greiff & Iliescu, 2017), and face validity assessments often need to go beyond exploring individual items. For example, one pilot study explored two stems (i.e., not the actual items) for a questionnaire measure of collective efficacy (Short et al., 2002). The first stem assessed individual perceptions of team collective efficacy (“rate your confidence that your team…”), and the second asked respondents to rate their team’s perception of collective efficacy (“rate your team’s confidence…”). Even though theory suggested that the second stem should be more correct (Bandura, 1997), the first stem was adopted for the questionnaire as it was considered easier to understand for test-takers (Short et al., 2005).Importance of Face ValidityThe importance of having a psychological test with high face validity relates to (1) the quality of the data collected and (2) the experiences of test-takers. Having a questionnaire with high face validity will naturally lead to better-quality data. For example, test-takers typically do not want to displease those delivering the test and so will often respond to items (as best they can) regardless of whether they have fully understood the question being asked. Therefore, the data collected will be of poorer quality if participants are responding to items that are confusing or unclear. Furthermore, participants might be less willing to complete all items if they find items distressing, intrusive, or judgmental, resulting in a smaller pool of data (or less representative data). In terms of the experiences of test-takers, participants will experience greater satisfaction when completing tests that are relevant to them, easy to answer, and not judgmental, intrusive, or distressing. On the other hand, a measure with low face validity might leave participants confused about what you are trying to measure and/or frustrated that they were struggling to answer items (potentially leading to feelings of shame or embarrassment). These negative experiences could affect subsequent responses on the test itself – such as lower levels of effort (e.g., “these questions make no sense to me, I will just give some random answers”) and are likely to reduce participants’ willingness to volunteer their time in further studies.ConclusionThis editorial has provided a brief outline of the nature of face validity that we hope will become a stable feature of all scale construction studies in psychological assessment. To summarize, face validity is not hypothetical or assumed validity, it is not a general appearance of validity, and it is not content validity. Face validity is a legitimate form of validity that is reflected in the clarity, relevance, difficulty, and sensitivity of a measure to its intended audience. Face validity can be assessed using qualitative or quantitative methods and must be assessed in the population of interest. Establishing face validity is an essential part of developing new psychological assessment tools and is particularly important when developing assessments for specific populations (e.g., children, clinical samples [e.g., adults with intellectual disabilities]). Provided it is assessed well, and according to modern (correct) definitions, face validity is not an inferior form of validity, but rather, is an essential component of scale construction. We hope this editorial encourages authors to include face validity tests when developing new measures and exploring the validity of established measures. In particular, we hope to see more research that explores the face validity of adapted tests (across populations, cultures, and world regions) where certain words might invoke different meanings and interpretations, or be considered more or less sensitive (e.g., intrusive) in diverse populations.References Bandura, A. (1997). Self-efficacy: The exercise of control. W.H. Freeman and Company. First citation in articleGoogle Scholar Bornstein, R. F. (1996). Face validity in psychological assessment: Implications for a unified model of validity. American Psychologist, 51(9), 983–984. https://doi.org/10.1037/0003-066X.51.9.983 First citation in articleCrossref, Google Scholar Connell, J., Carlton, J., Grundy, A., Taylor Buck, E., Keetharuth, A. D., Ricketts, T., Barkham, M., Robotham, D., Rose, D., & Brazier, J. (2018). The importance of content and face validity in instrument development: Lessons learnt from service users when developing the Recovering Quality of Life measure (ReQoL). Quality of Life Research, 27(7), 1893–1902. https://doi.org/10.1007/s11136-018-1847-y First citation in articleCrossref, Google Scholar Downing, S. M. (2006). Face validity of assessments: Faith-based interpretations or evidence-based science? Medical Education, 40, 7–8. https://doi.org/10.1111/j.1365-2929.2005.02361.x First citation in articleCrossref, Google Scholar Frantz, A., & Holmgren, K. (2019). The Work Stress Questionnaire (WSQ) – Reliability and face validity among male workers. BMC Public Health, 19, Article 1580. https://doi.org/10.1186/s12889-019-7940-5 First citation in articleCrossref, Google Scholar Greenlees, I., Lane, A., Thelwell, R., Holder, T., & Hobson, G. (2005). Team-referent attributions among sport performers. Research Quarterly for Exercise and Sport, 76(4), 477–487. https://doi.org/10.1080/02701367.2005.10599321 First citation in articleCrossref, Google Scholar Greiff, S., & Iliescu, D. (2017). A test is much more than just the test itself: Some thoughts on adaptation and equivalence. European Journal of Psychological Assessment, 33(3), 145–148. https://doi.org/10.1027/1015-5759/a000428 First citation in articleLink, Google Scholar Hardesty, D. M., & Bearden, W. O. (2004). The use of expert judges in scale development: Implications for improving face validity of measures of unobservable constructs. Journal of Business Research, 57(2), 98–107. https://doi.org/10.1016/S0148-2963(01)00295-8 First citation in articleCrossref, Google Scholar Holden, R. B. (2010). Face validity. In I. B. WeinerW. E. CraigheadEds., The Corsini encylopedia of psychology (4th ed., pp. 637–638). Wiley. First citation in articleCrossref, Google Scholar Holden, R. R., & Jackson, D. N. (1979). Item subtlety and face validity in personality assessment. Journal of Consulting and Clinical Psychology, 47(3), 459–468. https://doi.org/10.1037/0022-006X.47.3.459 First citation in articleCrossref, Google Scholar Kidder, L. H. (1982). Face validity from multiple perspectives. New Directions for Methodology of Social & Behavioral Science, 12, 41–57. First citation in articleGoogle Scholar Miller, D. T., & Ross, M. (1975). Self-serving biases in the attribution of causality: Fact or fiction? Psychological Bulletin, 82(2), 213–225. https://doi.org/10.1037/h0076486 First citation in articleCrossref, Google Scholar Mishra, M., & Allen, M. S. (2023). Face, construct and criterion validity, and test-retest reliability, of the Adult Rejection Sensitivity Questionnaire. European Journal of Psychological Assessment. Advance online publication. https://doi.org/10.1027/1015-5759/a000782 First citation in articleGoogle Scholar Mosier, C. I. (1947). A critical examination of the concepts of face validity. Educational and Psychological Measurement, 7(2), 191–205. https://doi.org/10.1177/001316444700700201 First citation in articleCrossref, Google Scholar Nevo, B. (1985). Face validity revisited. Journal of Educational Measurement, 22(4), 287–293. https://doi.org/10.1111/j.1745-3984.1985.tb01065.x First citation in articleCrossref, Google Scholar Nunnally, J. C., & Bernstein, I. H. (1994). Psychometric theory. McGraw-Hill. First citation in articleGoogle Scholar Royal, K. (2016). “Face validity” is not a legitimate type of validity evidence!. The American Journal of Surgery, 212(5), 1026–1027. https://doi.org/10.1016/j.amjsurg.2016.02.018 First citation in articleCrossref, Google Scholar Secolsky, C. (1987). On the direct measurement of face validity: A comment on Nevo. Journal of Educational Measurement, 24(1), 82–83. https://www.jstor.org/stable/1434528 First citation in articleCrossref, Google Scholar Short, S. E., Apostal, K., Harris, C., Poltavski, D., Young, J., Zostautas, N., & Feltz, D. L. (2002). Assessing collective efficacy: A comparison of two approaches. Journal of Sport and Exercise Psychology, 24, S115. First citation in articleGoogle Scholar Short, S. E., Sullivan, P., & Feltz, D. L. (2005). Development and preliminary validation of the collective efficacy questionnaire for sports. Measurement in Physical Education and Exercise Science, 9, 181–202. https://doi.org/10.1207/s15327841mpee0903_3 First citation in articleCrossref, Google Scholar Thomas, S. D., Hathaway, D. K., & Arheart, K. L. (1992). Face validity. Western Journal of Nursing Research, 14(1), 109–112. https://doi.org/10.1177/019394599201400111 First citation in articleCrossref, Google Scholar Turner, M. J., Allen, M. S., Slater, M. J., Barker, J. B., Woodcock, C., Harwood, C. G., & McFayden, K. (2018). The development and initial validation of the Irrational Performance Beliefs Inventory (iPBI). European Journal of Psychological Assessment, 34(3), 174–180. https://doi.org/10.1027/1015-5759/a000314 First citation in articleLink, Google Scholar Turner, S. P. (1979). The concept of face validity. Quality and Quantity, 13(1), 85–90. First citation in articleCrossref, Google ScholarFiguresReferencesRelatedDetails Volume 39Issue 3May 2023ISSN: 1015-5759eISSN: 2151-2426 Published onlineMay 31, 2023 InformationEuropean Journal of Psychological Assessment (2023), 39, pp. 153-156 https://doi.org/10.1027/1015-5759/a000777.© 2023Hogrefe PublishingPDF download

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call