Abstract

In dermatology and elsewhere, genome-wide association study (GWAS) meta-analyses now routinely include data from large-scale population-based biobanks (Zhou et al., 2022Zhou W Kanai M Wu K-HH Rasheed H Tsuo K Hirbo JB et al.Global Biobank Meta-analysis Initiative: Powering genetic discovery across human disease.Cell Genomics. 2022; 2100192Abstract Full Text Full Text PDF Scopus (12) Google Scholar). Many examples (Boutin et al., 2020Boutin TS Charteris DG Chandra A Campbell S Hayward C Campbell A et al.Insights into the genetic basis of retinal detachment.Hum Mol Genet. 2020; 29: 689-702Crossref PubMed Scopus (13) Google Scholar; Han et al., 2020Han Y Jia Q Jahani PS Hurrell BP Pan C Huang P et al.Genome-wide analysis highlights contribution of immune system pathways to the genetic architecture of asthma.Nat Commun. 2020; 11: 1776Crossref PubMed Scopus (49) Google Scholar; Mitchell et al., 2022Mitchell BL Saklatvala JR Dand N Hagenbeek FA Li X Min JL et al.Genome-wide association meta-analysis identifies 29 new acne susceptibility loci.Nat Commun. 2022; 13: 702Crossref PubMed Scopus (5) Google Scholar; Tachmazidou et al., 2019Tachmazidou I Hatzikotoulas K Southam L Esparza-Gordillo J Haberland V Zheng J et al.Identification of new therapeutic targets for osteoarthritis through genome-wide analyses of UK Biobank data.Nat Genet. 2019; 51: 230-236Crossref PubMed Scopus (188) Google Scholar) have employed data from UK Biobank, a study of more than 500,000 participants aged 40-70 with self-reported and electronic health record (EHR)-derived clinical diagnoses (Bycroft et al., 2018Bycroft C Freeman C Petkova D Band G Elliott LT Sharp K et al.The UK Biobank resource with deep phenotyping and genomic data.Nature. 2018; 562: 203-209Crossref PubMed Scopus (2334) Google Scholar). However, correct interpretation of genetic or epidemiological associations identified in biobank data should acknowledge that cases selected via study-specific self-report and EHR procedures may be subject to misclassification or a different disease phenotype on average than those ascertained in a specialist clinical setting and typically used in molecular studies of disease processes (Cai et al., 2020Cai N Revez JA Adams MJ Andlauer TFM Breen G Byrne EM et al.Minimal phenotyping yields genome-wide association signals of low specificity for major depression.Nat Genet. 2020; 52: 437-447Crossref PubMed Scopus (98) Google Scholar). We focus on chronic plaque psoriasis, reporting a framework that uses genetic effect size estimates to evaluate the consistency between candidate biobank phenotypes and psoriasis diagnosed by a specialist physician. Specifically, we assess the degree to which candidate biobank definitions capture non-psoriasis cases – or (presumably milder) psoriasis cases with lower genetic liability than typical specialist-ascertained cases – by regressing estimated genetic effect sizes at established psoriasis susceptibility loci against reference values obtained from previous GWAS of psoriasis case cohorts where recruitment was based on in-person specialist diagnosis (Tsoi et al., 2017Tsoi LC Stuart PE Tian C Gudjonsson JE Das S Zawistowski M et al.Large scale meta-analysis characterizes genetic architecture for common psoriasis associated variants.Nat Commun. 2017; 815382Crossref PubMed Scopus (166) Google Scholar) (Figure 1). Our inverse variance-weighted (IVW) regression slope estimates a lower bound for the positive predictive value (minPPV) for true psoriasis cases among participants selected by the candidate biobank definition (full details in Supplementary Methods). We validate our approach on dermatologist-derived case-control psoriasis GWAS and simulated case-control cohorts with known misclassification rate (Supplementary Methods; Supplementary Figure S1, Supplementary Table S4). We applied our method to UK Biobank (unrelated white British participants after quality control; N = 336,733), in which psoriasis cases can be defined using a single data source (self-reporting, linked GP diagnoses or hospital episode statistics; Table 1, Supplementary Table S1), or combinations thereof. Among single-source candidate psoriasis definitions, self-reported psoriasis (NSRP = 4,244) was most concordant with specialist-diagnosed psoriasis (minPPVSRP = 66.9%, 95%CI: 61.2-72.6%), even more so with a self-reported psoriasis-relevant medication (NSRPM = 1,927; minPPVSRPM = 73.9%, 95%CI: 65.2-82.6%). Psoriasis definitions from hospital episode statistics identified fewer psoriasis cases (NHESany = 1,726) and were less concordant (minPPVHESany = 57.9%, 95%CI: 48.9-66.8%). GP-based psoriasis definitions were least concordant with specialist diagnosis (NGP = 5,768; minPPVGP = 46.4%, 95%CI: 40.5-52.3%), albeit improving when multiple GP diagnoses were required (NGP2 = 2,422; minPPVGP2 = 58.6%, 95%CI: 50.3-66.9%).Table 1List of selected candidate UK Biobank psoriasis phenotypes, with abbreviations, case numbers before and after genotyping QC, IVW estimate, and power to detect a common (MAF=30%) risk factor of weak effect (OR=1.1).AbbreviationPhenotype descriptionNumber of psoriasis cases (all)Number of psoriasis cases (genotyped, white British unrelated)IVW regression slope (∼minPPV)Mean (95% CI)(vs. selected controls, n=141,279)Power to detect common weak effect (vs. selected controls)Single data sourceSRPSelf-reported psoriasis6,1104,2440.669 (0.612 – 0.726)0.478SRPMSelf-reported psoriasis and medication relevant to psoriasis2,7501,9270.739 (0.652 – 0.826)0.296HESmainPsoriasis as main diagnosis in linked hospital episode statistics4492890.605 (0.422 – 0.788)0.077HESsecPsoriasis as secondary diagnosis in linked hospital episode statistics2,3001,5320.587 (0.491 – 0.683)0.175HESanyPsoriasis as main or secondary diagnosis in linked hospital episode statistics2,5931,7260.579 (0.489 – 0.668)0.178GPrawPsoriasis diagnosis in linked GP data, using read codes corresponding to ICD-10 psoriasis codes in UK Biobank mapping file11,5607,9560.324 (0.279 – 0.37)0.243GPPsoriasis diagnosis in linked GP data, using curated list of read codes8,4445,7680.464 (0.405 – 0.523)0.340GP2Two or more psoriasis diagnoses in GP data using curated read codes3,4722,4220.586 (0.503 – 0.669)0.242GP3Three or more psoriasis diagnoses in GP data using curated read codes1,9841,3890.614 (0.515 – 0.714)0.172Combined data sources1-SRP-HESanyAny one of SRP or HESany7,5685,1940.624 (0.57 – 0.677)0.4991-SRP-GPAny one of SRP or GP12,6168,6470.517 (0.471 – 0.563)0.5431-SRP-GP2Any one of SRP or GP28,3205,7860.615 (0.559 – 0.67)0.5381-SRP-HESany-GPAny one of SRP, HESany or GP13,6669,3160.508 (0.463 – 0.553)0.5611-SRP-HESany-GP2Any one of SRP, HESany or GP29,5466,5740.585 (0.535 – 0.636)0.5422-SRP-HESany-GPAny two of SRP, HESany or GP3,0562,1220.721 (0.638 – 0.805)0.303All-SRP-HESany-GPAll three of SRP, HESany or GP4253000.818 (0.616 – 1.02)0.1052-SRP-SRM-HESany-GPAny two of SRP, SRM, HESany or GP5,0803,4990.696 (0.628 – 0.763)0.4432-SRP-SRM-HESany-GP2Any two of SRP, SRM, HESany or GP24,2912,9650.726 (0.66 – 0.792)0.4163-SRP-SRM-HESany-GPAny three of SRP, SRM, HESany or GP1,5991,1220.771 (0.675 - 0.866)0.216All-SRP-SRM-HESany-GPAll four of SRP, SRM, HESany or GP2621850.87 (0.637 - 1.104)0.084Phenotypes incorporating psoriatic arthritis codesSRP+PsASelf-reported psoriasis or psoriatic arthritis (PsA)6,6364,6030.664 (0.606 - 0.721)0.503SRPM+PsASelf-reported psoriasis or PsA, and psoriasis-relevant medication3,0132,1070.747 (0.661 - 0.832)0.330HESany+PsAPsoriasis or PsA as main or secondary diagnosis in linked hospital episode statistics3,3882,2720.616 (0.53 - 0.703)0.252GP+PsAPsoriasis or PsA diagnosis in linked GP data using curated read codes8,8086,0240.457 (0.398 - 0.517)0.3481-SRP-HESany-GP+PsAAny one of SRP+PsA, HESany+PsA or GP+PsA14,4759,8640.51 (0.464 - 0.555)0.5822-SRP-HESany-GP+PsAAny two of SRP+PsA, HESany+PsA or GP+PsA3,7192,5790.713 (0.629 - 0.797)0.3482-SRP-SRM-HESany-GP+PsAAny two of SRP+PsA, SRM+PsA, HESany+PsA or GP+PsA5,6923,9170.688 (0.62 - 0.756)0.468minPPV: lower bound of positive predictive value for psoriasis phenotype (i.e. IVW regression slope). Open table in a new tab minPPV: lower bound of positive predictive value for psoriasis phenotype (i.e. IVW regression slope). We recognise that the large sample sizes afforded by biobank studies may offset limitations in phenotype stringency when considering statistical power to detect novel genetic and epidemiological associations (Supplementary Figure S2). We therefore estimated power to detect an association with a novel psoriasis risk factor (population frequency 0.3, odds ratio 1.1; Table 1) (details and results for other scenarios in Supplementary; Supplementary Figure S3). Among single-source candidate definitions in UK Biobank, self-reported psoriasis demonstrated highest power for discovery (powerSRP = 47.8%), substantially higher than the larger but less concordant GP-based definition (powerGP = 34.0%). We then considered composite psoriasis definitions based on multiple data sources. Requiring a single coding from any source conferred limited agreement with specialist-defined psoriasis (minPPV1-SRP-HESany-GP = 50.8%, 95%CI: 46.3-55.3%) but large case numbers such that statistical power for discovery exceeded all other definitions (N1-SRP-HESany-GP = 9,316; power1-SRP-HESany-GP = 56.1%). Requiring two independent corroborative codings improved concordance with specialist-defined psoriasis to ∼70% (minPPV2-SRP-HESany-GP = 72.2%, 95%CI: 63.8-80.5%; minPPV2-SRP-SRM-HESany-GP = 69.6%, 95%CI: 62.8-76.3%) although power (power2-SRP-HESany-GP = 30.3%; power2-SRP-SRM-HESany-GP = 44.3%) remained lower than the top-performing single-source definition (powerSRP = 47.8%). UK Biobank participants with psoriasis codings across all data sources demonstrated high concordance (minPPVAll-SRP-HESany-GP = 81.8%, 95%CI: 61.6-102.0%; minPPVAll-SRP-SRM-HESany-GP = 87.0%, 95%CI: 63.7-110.4%) with confidence intervals crossing 100%. This is consistent with our positive control GWAS studies, which had slope estimates between 0.9 and 1.1 with confidence intervals crossing 1 (Supplementary Table S4), the smallest cohort (n=464 cases) being the only exception. Our estimated minPPV of self-reported psoriasis in UK Biobank (67%) is much higher than previous self-reported psoriasis in 23andMe (36%) (Tsoi et al., 2017Tsoi LC Stuart PE Tian C Gudjonsson JE Das S Zawistowski M et al.Large scale meta-analysis characterizes genetic architecture for common psoriasis associated variants.Nat Commun. 2017; 815382Crossref PubMed Scopus (166) Google Scholar). This may be due to ascertainment differences: rather than an online questionnaire, UK Biobank participants are interviewed by a trained research nurse and are required to have seen a doctor for each reported condition (UK Biobank 2012). Primary care and hospital data may have lower estimated minPPVs than self-reporting due to misclassification due to the difficulty in non-specialist differential diagnosis of psoriasis from other common lesional skin diseases. Alternatively, patients diagnosed through primary care or hospital episodes (where most recorded diagnoses are secondary) may suffer from milder psoriasis on average, with consequent reduced genetic liability, than those included in dermatologist-diagnosed psoriasis GWAS studies; previous work showed that 90% of psoriasis primary care diagnoses were subsequently confirmed by GP reviewers (Seminara et al., 2011Seminara NM Abuabara K Shin DB Langan SM Kimmel SE Margolis D et al.Validity of The Health Improvement Network (THIN) for the study of psoriasis.British Journal of Dermatology. 2011; 164: 602-609PubMed Google Scholar). The relatively low regression slope estimates for UK Biobank psoriasis indicators may represent not only case misclassification but a lower genetic liability for psoriasis among patients with mild disease than those with severe disease. We recognise that without a formal validation exercise, methods such as we present here are unable to distinguish between these scenarios. However, when considering that most molecular research into psoriasis biology is conducted in moderate-severe patients, our IVW slope estimates remain valuable as a measure of aggregate genetic risk among cases equivalent to a PPV for dermatologist-ascertained psoriasis. The optimal psoriasis definition for future genetic and epidemiological investigations will depend on the specific research aims. In UK Biobank we recommend that discovery research, with statistical power a priority, defines cases using any self-reported or EHR psoriasis coding (and our maximum statistical power of 58% should be interpreted in the context of contributing to larger meta-analyses); studies requiring accurate effect size estimates and high concordance with dermatologist-diagnosed psoriasis are encouraged to use two or more data sources. We also recommend the inclusion of PsA diagnostic codes for the beneficial effect on sample size with minimal drop-off in concordance (Supplementary Table S2). It remains unclear whether concordance is unaffected due to the PsA-only participants having cutaneous psoriasis not coded in UK Biobank, or because PsA shares genetic risk loci with cutaneous psoriasis. In UK Biobank, a definition requiring only self-reporting of psoriasis balances both high diagnostic validity and statistical power; generalisation of this finding to other datasets may depend on the ascertainment method. To facilitate such assessments, we have demonstrated here an approach to assess the composition of psoriasis diagnoses when assembling future cohorts from large EHR/questionnaire-based biobank studies. The UK Biobank resource is available to bona fide researchers for health-related research in the public interest (https://www.ukbiobank.ac.uk/enable-your-research). Biomarkers of Systemic Treatment Outcomes in Psoriasis data are available for approved research use by making an application to the BSTOP Data Access Committee (https://www.kcl.ac.uk/lsm/research/divisions/gmm/departments/dermatology/research/stru/groups/bstop/documents). UK Biobank,UK Biobank. UK Biobank Resource 100235: The verbal interview within ACE centres. https://biobank.ctsu.ox.ac.uk/showcase/ukb/docs/TouchscreenQuestionsMainFinal.pdf. 2012.Google Scholar. This project has received funding from the Innovative Medicines Initiative 2 Joint Undertaking under grant agreement number 821511 (Biomarkers in Atopic Dermatitis and Psoriasis). The Joint Undertaking receives support from the European Union’s Horizon 2020 research and innovation programme and the European Federation of Pharmaceutical Industries and Associations. This research has been conducted using the UK Biobank Resource (approved project 15147) and uses data provided by patients and collected by the NHS as part of their care and support. ND received funding from Health Data Research UK (MR/S003126/1), which is funded by the UK Medical Research Council, Engineering and Physical Sciences Research Council, Economic and Social Research Council, Department of Health and Social Care (England), Chief Scientist Office of the Scottish Government Health and Social Care Directorates, Health and Social Care Research and Development Division (Welsh Government), Public Health Agency (Northern Ireland), British Heart Foundation and the Wellcome Trust. SKM is funded by a MRC Clinical Academic Research Partnership Award (MR/T02383X/1). We would like to thank the Psoriasis Association for ongoing support and funding since the inception of Biomarkers of Systemic Treatment Outcomes in Psoriasis (reference: RG2/10: RG2/10). The authors acknowledge the invaluable support of the NIHR through the clinical research networks and its contribution in facilitating recruitment to Biomarkers of Systemic Treatment Outcomes in Psoriasis. Members of the BSTOP Study Group who contributed to the collection of valuable clinical information and samples for profiling (excluding individually named authors of this work) are Nadia Aldoori, Mahmud Ali, Alex Anstey, Fiona Antony, Charles Archer, Suzanna August, Periasamy Balasubramaniam, Kay Baxter, Anthony Bewley, Alexandra Bonsall, Victoria Brown, Katya Burova, Aamir Butt, Mel Caswell, Sandeep Cliff, Mihaela Costache, Sharmela Darne, Emily Davies, Claudia DeGiovanni, Trupti Desai, Bernadette DeSilva, Victoria Diba, Eva Domanne, Harvey Dymond, Caoimhe Fahy, Leila Ferguson, Maria-Angeliki Gkini, Alison Godwin, Fiona Hammonds, Sarah Johnson, Teresa Joseph, Manju Kalavala, Mohsen Khorshid, Liberta Labinoti, Nicole Lawson, Alison Layton, Tara Lees, Nick Levell, Helen Lewis, Calum Lyon, Sandy McBride, Sally McCormack, Kevin McKenna, Serap Mellor, Ruth Murphy, Paul Norris, Caroline Owen, Urvi Popli, Gay Perera, Nabil Ponnambath, Helen Ramsay, Aruni Ranasinghe, Saskia Reeken, Rebecca Rose, Rada Rotarescu, Ingrid Salvary, Kathy Sands, Tapati Sinha, Simina Stefanescu, Kavitha Sundararaj, Kathy Taghipour, Michelle Taylor, Michelle Thomson, Joanne Topliffe, Roberto Verdolini, Rachel Wachsmuth, Martin Wade, Shyamal Wahie, Sarah Walsh, Shernaz Walton, Louise Wilcox, and Andrew Wright. UK Biobank data and phenotype definition This project used UK Biobank data under approved project number 15147. UK Biobank is a prospective study with over 500,000 participants aged 40–69 years when recruited in 2006–2010 (Bycroft et al. 2018). The study has collected and continues to collect extensive phenotypic and genotypic detail about its participants, including data from questionnaires, physical measures, sample assays, accelerometry, multimodal imaging, genome-wide genotyping and longitudinal follow-up for a wide range of health-related outcomes. Linkage to health record data comprises HES data that documents hospital inpatient visits, and primary care data (currently available for 230,105 participants). The UK Biobank study was approved by the National Health Service National Research Ethics Service (ref. 11/NW/0382), and all participants provided written informed consent. We defined nine candidate psoriasis definitions based on a single source data type (Table 1): two based on self-reported illnesses and medications, three based on linked hospital episode statistics (HES) data, and four based on linked primary care data. Full details of the codes included are given in Supplementary Table S1. Self-reported medications include prescription medications being taken regularly by participants at the time of their assessment centre visit (UK Biobank field 20003). The full list of medications reported by participants self-reporting psoriasis was reviewed by dermatologists (SKM, CHS) to identify relevant psoriasis medications. Linked primary care data includes two types of read code: readV2 and readCTV3 (NHS Digital 2020). We included codes of both types. We further distinguished candidate primary care phenotypes based on read codes that corresponded to ICD-10 L40 codes in a UK Biobank mapping file (Table 1: GPraw) (UK Biobank 2021) from those based on a previously validated list of psoriasis read codes (Table 1: GP) (Seminara et al. 2011). Validated read code lists in readV2 format were mapped to readCTV3 using the UK Biobank mapping file (UK Biobank 2021). We further considered candidate psoriasis definitions based on combining data sources. These ranged from broader definitions where a single psoriasis coding across data sources would be sufficient, to stricter definitions requiring psoriasis codings from multiple data sources (Table 1). Since psoriatic arthritis (PsA) typically presents with skin lesions we considered additional candidate psoriasis definitions based on expanded lists of self-report, primary care (Ogdie et al. 2013) and HES codes that included PsA (Table 1, Supplementary Table S1). UK Biobank genotype data and association testing The UK Biobank central team performed genotype calling and imputation. Genotyping was performed using the Affymetrix UK BiLEVE Axiom array (n∼50,000) and the Affymetrix UK Biobank Axiom array (n∼450,000) (Bycroft et al. 2018). Based on QC metrics provided by UK Biobank, we removed samples that exhibited gender mismatch, excess relatedness, heterozygosity or missingness > 5%, and extracted individuals determined by UK Biobank to form an unrelated subset of homogeneous (White British) ancestry. We then removed additional individuals with low call rates (<98%) in well-called (>90%) markers, giving 336,814 samples for subsequent analysis (336,733 after withdrawals). Genome-wide imputation was performed by the UK Biobank central team using IMPUTE2 software and a reference panel derived from UK10K and 1000 Genomes Phase 3 haplotypes (Howie et al. 2011; Howie et al. 2009). For subsequent analysis we considered variants with imputation R2>0.7 and minor allele frequency >0.5%. We performed association testing at 35 variants of interest (see below) for each candidate psoriasis definition we generated. Each definition provided a set of participants to be considered psoriasis cases. For unaffected controls, we included participants that had linked primary care data, and were negative for psoriasis under all candidate psoriasis definitions (n = 141,279). We fitted a logistic regression for each variant using PLINK v2.0 (Chang et al. 2015), using 20 ancestry principal components and genotyping array as covariates. Genetic instrument selection and regression To derive a reference genetic instrument representative of dermatologist-diagnosed psoriasis, summary statistics from seven dermatologist-derived case-control GWAS studies (totalling 13,229 cases and 21,543 controls) (Tsoi et al. 2017) were analysed using an inverse variance-weighted fixed effect meta-analysis. We identified 38 independent genome-wide significant (P < 5×10-8) associations at least 1Mb apart. This excluded associations in the MHC region on chromosome 6: the strong association between HLA-C*06:02 and psoriasis age of onset means that estimated effect sizes at this locus are strongly influenced by ascertainment strategy and comparison across studies is complex. Of the 38 lead variants, 35 were available in the UK Biobank imputed genetic dataset while the remaining 3 were unavailable with no suitable proxy found using the LDLink platform (Machiela and Chanock 2015). For each candidate UK Biobank psoriasis definition, effect sizes (betas) at the 35 lead variants were regressed against effect sizes from the reference instrument (Supplementary Table S5), weighted by inverse variance to give higher weight to loci with more confident effect size estimates, using function mr_ivw from R package MendelianRandomization (version 0.5.0) (Yavorska and Burgess 2017). The slope of the regression line gives an indication of how depressed, on average, are the effect sizes of the candidate psoriasis phenotype in comparison to established psoriasis effect sizes (Figure 1); a slope of 1 would indicate effect sizes consistent with those already established for dermatologist-derived psoriasis (full results in Supplementary Table S2). To understand how accuracy and statistical power would be affected by a less stringent definition of controls, we fitted alternative regression models in which all participants not positive for the candidate psoriasis definition were included as “unselected” controls (Supplementary Table S3). We observed slightly higher regression slopes when using selected controls, in comparison to using unselected controls (Supplementary Figure S4). IVW regression slope interpretation With the assumption that the unaffected control group of UK Biobank participants were representative of the control datasets in dermatologist-derived psoriasis GWAS, we considered that any effect size depression relative to the reference genetic instrument could be driven by the inclusion of misclassified individuals within the psoriasis cases (compared to the specialist-diagnosed psoriasis cohorts from Tsoi et al.). This may be through incorrect self-report, or misdiagnosis by non-dermatologists in linked HES or primary care data. At any single locus, the degree of effect size depression will depend on the positive predictive value (PPV; true positives / true positives + false positives) of the candidate UKB definition but also on the allele frequency and the magnitude of the established effect. The relationship between the PPV and the IVW regression slope is therefore complex, and we undertook simulations to inform our interpretation of IVW regression slopes. Using PLINK v1.9 (Chang et al., 2015), we simulated genetic datasets representing affected and unaffected individuals for 35 SNPs with effect size and frequency equivalent to those in the reference genetic instrument. In each simulation we included 125,000 controls from simulated unaffected individuals, and 5,000 cases that were a mix of simulated affected (true positive) and unaffected (false positive) individuals. The simulated PPV was varied from 0 to 100% in 10% increments, and we performed 1,000 simulations per dilution level. We calculated the IVW slope for each simulation, and observed an approximately linear relationship between the IVW slope and PPV (Supplementary Figure S1). In UK Biobank, effect size depression relative to the reference genetic instrument may not be exclusively due to false positives (individuals with incorrect psoriasis diagnoses/self-reports), but also individuals with milder psoriasis (with lower genetic liability than typical specialist-ascertained cases). Our IVW slopes can therefore be interpreted as representing the lower bound of the PPV (minPPV) for each candidate UK Biobank psoriasis phenotype. To ensure that effect sizes were not systematically affected by difference in risk allele frequencies between Tsoi et al. and UK Biobank populations (due to differences in ancestry), risk allele frequencies were compared between the control populations of the two studies (Supplementary Figure S5) and shown to be highly concordant. 23andMe data was excluded from the Tsoi et al. cohorts, and the CASP cohort genotype data was unavailable. Power calculations For each candidate UKB psoriasis phenotype, we estimated the statistical power to detect association with a novel genetic or epidemiological psoriasis risk factor. We took an empirical approach, based on a simple 2×2 contingency table that assumed true population proportions PEA (exposed, affected), PNA (not exposed, affected), PEU (exposed, unaffected) and PNU (not exposed, unaffected). For a specified odds ratio (PEAPNU/PNAPEU), psoriasis prevalence (PA=PEA+PNA) and risk factor prevalence (PE=PEA+PEU), implied values for these proportions can be calculated using the fact that proportions sum to 1. For each candidate UK Biobank psoriasis definition, these population proportions were used to simulate case and control samples of the same size, with the IVW slope estimate determining the simulated PPV. A Bernoulli function was used to generate risk factor-exposed and unexposed individuals, using PEA/PA as the probability of true positive cases being exposed to the risk factor, and PEU/PU as the probability of false positive cases and controls being exposed to the risk factor. Each power calculation was based on 10,000 simulations, with power estimated as the proportion of simulations for which a 2×2 chi-squared statistic was above 3.841 (P<0.05). For all power calculations we assumed a psoriasis prevalence of 2%. For each candidate psoriasis definition, we calculated power for four scenarios based on a hypothetical risk factor’s population frequency (common: 30%; rare: 5%) and odds ratio (weak: 1.1; strong: 1.25). For brevity, only the power estimates for common, weak risk factors are presented in the main text. When estimating power for discovery using unselected controls, we took the estimate of true case dilution (PPV) from the IVW slope corresponding to selected controls. This is because the case group is not influenced by control selection, and our simulations that demonstrated a linear relationship were based on selected controls.Supplementary Figure S2 – IVW slope estimate by number of cases across all candidate UK Biobank phenotypes. Legend: IVW slope estimate plot against the number of cases for each of the UK Biobank psoriasis definitions in Supplementary Table S2.View Large Image Figure ViewerDownload Hi-res image Download (PPT)Supplementary Figure S3 – Power to detect novel psoriasis-associated risk factors for top-performing candidate psoriasis definitions. Legend: Comparison of phenotype IVW estimate (minPPV) and statistical power to detect novel psoriasis-associated risk factors under various scenarios for selected candidate psoriasis definitions that score highly on both measures. Y-axis: estimated statistical power; x-axis: IVW estimate of candidate psoriasis definition; left column: common risk factor (frequency 0.3); right column: rare risk factor (frequency 0.05); top row: weak risk factor (odds ratio 1.1); bottom row: strong risk factor (odds ratio 1.25). Candidate phenotype abbreviations are described in Table 1.View Large Image Figure ViewerDownload Hi-res image Download (PPT)Supplementary Figure S4 – Comparison of candidate psoriasis definition IVW using selected and unselected controls Legend: IVW slope estimates for UK Biobank psoriasis definitions with selected controls (Supplementary Table S2) plot against IVW slope estimate for equivalent UK Biobank psoriasis definitions with unselected controls (Supplementary Table S3). Red line (x=y) represents equality.View Large

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call