Cases for a disease can be defined broadly using diagnostic codes, or narrowly using gold-standard confirmation that often is not available in large administrative datasets. These different definitions can have significant impacts on the results and conclusions of studies. We conducted this study to assess how using melanoma phecodes versus histologic confirmation for invasive or in situ melanoma impacts the results of a genome-wide association study (GWAS) using the Million Veteran Program. Melanoma status was determined three ways: (1) by the presence of two or more phecodes, (2) histologically-confirmed invasive melanoma, and (3) histologically-confirmed melanoma in situ. We conducted a GWAS for variants with minor allele frequencies of 1% or greater. There were 45,665 cases in the phecode cohort, 5364 cases in the confirmed invasive melanoma cohort, and 4792 cases in the confirmed melanoma in situ cohort. There were 20,457 variants significant at the genome-wide level in the phecode cohort, 2582 in the invasive melanoma cohort, and 1989 in the melanoma in situ cohort. Most of the variants identified in the phecode cohort did not replicate in the histologically-confirmed cohorts. The different case definitions led to large differences in sample size and variants associated at the genome-wide level. Unvalidated and imprecise case definitions can lead to less accurate results. Investigators should use validated phenotypes when gold-standard definitions are not available.
Read full abstract