Observational studies are rarely representative of their target population because there are known and unknown factors that affect an individual's choice to participate (the selection mechanism). Selection can cause bias in a given analysis if the outcome is related to selection (conditional on the other variables in the model). Detecting and adjusting for selection bias in practice typically requires access to data on nonselected individuals. Here, we propose methods to detect selection bias in genetic studies by comparing correlations among genetic variants in the selected sample to those expected under no selection. We examine the use of four hypothesis tests to identify induced associations between genetic variants in the selected sample. We evaluate these approaches in Monte Carlo simulations. Finally, we use these approaches in an applied example using data from the UK Biobank (UKBB). The proposed tests suggested an association between alcohol consumption and selection into UKBB. Hence, UKBB analyses with alcohol consumption as the exposure or outcome may be biased by this selection.
Read full abstract