Social scientists frequently study complex constructs. Despite plethora of measures for these constructs, researchers may need to create their own measure for a particular study. When a measure is created, psychometric testing is required, and first step is to study content validity of measure. The purpose of this article is to demonstrate how to conduct a content validity study, including how to elicit most from a panel of experts by collecting specific data. Instructions how to calculate a content validity index, factorial validity index, and an interrater reliability index and guide for interpreting these indices are included. Implications regarding value of conducting a content validity study for practitioners and researchers are discussed. Key words: constructs; content validity; measure; psychometric testing ********** Researchers in social sciences study complex constructs for which valid and reliable measures are needed. The measures should be brief, clear, and easy to administer. Measures that are too long or difficult to read may result in a lowered response rate or inaccurate responses. In addition, measure must be appropriate for use in targeted population. For example, measures designed for use with heterogeneous populations may not be appropriate for a specific population with certain characteristics. A plethora of measures exist with known psychometric properties, but researchers may need to develop a new measure for a particular construct because no measure exists that operationalizes construct as researcher conceptualized it. In these circumstances, a content validity study should be conducted. VALIDITY Traditionally, three types of validity may be demonstrated: content, criterion, and construct validity. Content Validity Content validity refers to extent to which items a measure assess same content or how well content material was sampled in measure. Content validity can be characterized as face validity or logical validity. Face validity indicates that measure appears to be valid, on its face. Logical validity indicates a more rigorous process, such as using a panel of experts to evaluate content validity of a measure. Nunnally and Bernstein (1994) did not distinguish among different types of content validity; but presented alternative ways of assessing content validity. They suggested evaluating content validity by demonstrating internal consistency through correlating scores from measure with another measure of same construct and by showing change in posttest scores over pretest scores. Criterion Validity Criterion validity is demonstrated by finding a statistically significant relationship between a measure and a criterion (Nunnally & Bernstein, 1994). Criterion validity is considered gold standard, and usually a correlation is used to assess statistical relationship. For example, Graduate Record Examination (GRE) has been found to predict graduate school success (as measured by first-year grade-point average) for certain disciplines (Rubio, Rubin, & Brennan, 2003). Three types of criterion validity are postdictive, concurrent, and predictive. If criterion has occurred, validity is postdictive. The validity is concurrent if criterion exists at same time as construct measured. The GRE example demonstrates predictive validity, because graduate school success (criterion) occurs after taking GRE (measure). According to Nunnally and Bernstein, a correlation of .30 indicates adequate criterion validity. Construct Validity Anastasi and Urbina (1997) described construct validity as the extent to which test may be said to measure a theoretical construct or trait (p. 126). Three kinds of construct validity are factorial, known groups; and convergent and discriminant (or divergent) validity. …