Abstract

Social scientists increasingly form composite datasets using data from different survey programs, which often use different single-question instruments to measure the same latent construct. This creates an obstacle when we want to run analyses using the combined data, since the scores measured with different instruments are not necessarily comparable. In this paper, we explore one consequence of such comparability problems. Specifically, we examine the case where instruments measuring the same construct have different item difficulties. This means if we applied the instruments to the same population, we would get different mean responses. If such mean differences are not mitigated before combining data, we introduce a mean bias into our composite data. Such mean bias has direct consequences for analyses based on the combined data. In data drawn from the same population, mean bias introduces error variance. In data drawn from different populations it would bias or even invert true population differences. However, in this paper I demonstrate that mean bias can also bias bivariate correlations if one or both variables in a composite dataset are subject to mean bias. If differences in item difficulty are not mitigated before combining data, we introduce a variant of Simpson’s paradox into our data: The bivariate correlation in each source survey might differ substantially from the correlation in the composite dataset. In a set of systematic simulations, I demonstrate this correlation bias effect and show how it changes depending on the mean biases in each variable and the strength of the underlying true correlation.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.