Abstract Interviewer ratings of respondents’ physical appearance have been collected in several major social surveys. While researchers have made good use of such ratings data in substantive studies, empirical evidence on their measurement properties is rather limited. This study evaluates two potential threats to the quality of interviewer ratings of physical appearance: interviewer effects and halo effects. Using data from the China Family Panel Studies, we show large interviewer effects on interviewer ratings of respondents’ physical appearance based on cross-classified models. We also provide possible evidence for halo effects based on high correlations between physical appearance ratings and other theoretically distinct constructs, after controlling for interviewer effects. However, we find support for convergent and discriminant validity of physical appearance ratings when both interviewer effects and halo effects are controlled for. Empirical studies using interviewer observation data should take into account interviewer effects and halo effects when possible or at least discuss their potential impact on the substantive findings.