Background Studies of common genetic variants in complex psychiatric disorders such as major depression require samples that are too large (tens or hundreds of thousands) than can typically be collected with phenotyping methods involving direct interviews by trained clinicians. Many studies are no identifying cases and controls with self-report questionnaires, electronic medical records or registry data. Few data are available to determine the accuracy of these methods compared to direct interviews. Here, we present data on 1,263 individuals for whom online and direct interview data were available. Methods The Depression Genes and Networks study (DGN) recruited cases with recurrent MDD and controls with no lifetime MDD for a study of whole-blood gene expression. A research company emailed invitations to 14,463 survey panel members; 9,569 completed an online screen including the CIDI-Short Form depression and substance dependence modules, of whom 1,263 eventually gave blood samples and completed telephone SCID-IV interviews: 669 prospective cases (CIDI-SF recurrent MDD without current substance dependence) and 594 prospective controls (no 2-week period of depression or anhedonia with >2 MDD criteria; no current substance dependence). For this analysis we also identified narrow screening criteria for controls (no 2-week period with both depression and anhedonia; or one of these but not most of the day, nearly every day; no lifetime antidepressant [AD] use [SCID]). Polygenic score predictions were then examined in the GenRED-I GWAS cohort (which used non-overlapping controls screened with CIDI-SF) using broad vs. narrow controls. Results Among prospective cases, SCID diagnosis was MDD recurrent (N=547, 81.2%; some were excluded from DGN for other reasons) or uncomplicated single episode (39, 5.8%) totaling 87% with diagnoses that would be included in most genetic studies of MDD; 18 (2.7%) had major depressive episodes with complications that would typically be excluded (bereavement, medical or substance-related factors), and 65 (9.7%) had such exclusions as bipolar disorders (28, 4.2%), no MDD (30, 4.5%), or unreliable histories (7, 1%). Among prospective controls, 108 (18.2%) were excluded by SCID, primarily for MDD (57, 9.6%), sub-threshold depression (25, 4.2%), or a bipolar disorder (6, 1%). Narrow screening criteria would have excluded 76 (70%) of the 108 ineligible controls (57 by CIDI-SF clinical screening, 19 for self-reported AD use) and 44 (9.1%) of the SCID-eligible controls, thus predicting 93.2% validation by SCID. PRS prediction was improved in the GenRED cohort using only narrow controls. Discussion Online CIDI-SF screening for MDD-R has a high rate of validation by SCID interviews: 81.8% for MDD-R, and 87.6% if uncomplicated single-episode MDD is considered part of the genetic spectrum. Only 4.5% had no major depressive episode by SCID, but 4.2% received bipolar diagnoses. Although self-report screening for mania is only modestly accurate, we recommend using it to increase the proportion of positive cases. The false-positive rate in controls could be largely controlled by using a very strict threshold for self-reported history of depression (basically, not endorsing depression screening items) and by excluding for any self-reported antidepressant use. This excludes a small proportion of true controls, but an excess of prospective controls is typically available
Read full abstract