Background: Langerhans cell histiocytosis (LCH) is an inflammatory myeloid neoplasia characterized by lesions including pathogenic CD207+ dendritic cells among an inflammatory infiltrate. Sequencing studies have identified recurrent, mutually exclusive somatic activating mutations in MAPK pathway genes in ~85% of LCH lesions, including BRAF V600E in 50-65%. Despite advances to elucidate the somatic mutational landscape underlying LCH pathogenesis, germline risk factors remain largely unknown. We previously conducted a genome-wide association study of LCH and identified and validated a risk variant in SMAD6 (rs12438941, G->A) associated with a 3.7-fold increase in LCH risk. SMAD6 inhibits bone morphogenetic protein and transforming growth factor-beta (TGF-β)/activin signaling, which are determinants of physiologic myeloid dendritic cell differentiation, and is in close proximity to MAP2K1. Notably, this risk allele is more common in Hispanics, who are at the highest risk of developing LCH, and absent in those of African ancestry who experience the lowest risk of LCH. Our current objective is to further analyze risk of acquiring LCH associated with SMAD6 variations. Methods: Four-hundred and sixty-seven cases diagnosed with LCH were recruited at Texas Children's Hospital. Targeted sequencing of SMAD6 was conducted among cases with ~200x coverage at the Avera Institute of Human Genetics. We accessed aggregate-level SMAD6 sequence data at 940 overlapping loci in SMAD6 from a median of 15,694 non-cancer controls in gnomAD v2.1.1. Variants were pruned using SNPclip due to high linkage disequilibrium (LD; r2 > 0.8) (n = 216 removed), low minor allele frequency (MAF < 0.01) (n = 326 removed), absence from 1000Genomes (n = 240 removed), non-biallelic (n = 3 removed), and absence from dbSNP155 (GRCh37; n = 1 removed). The Fisher's exact test was applied to 154 SMAD6 variants with a Bonferroni critical P-value of 3.2 x 10-4 to assess any association between SMAD6 variation and case status. Sequence data were analyzed in SAS v9.2. Results: In this case-control analysis, 49 variants within SMAD6 were significantly associated with LCH risk (P-values ranging from 4.54 x 10-28 to 2.9 x 10-4), including our previously identified risk locus (P-value: 8.55 x 10-14). Most (48) of the hits were intronic, and one was in the 5' UTR (p = 2.88 x 10-5). The top 4 SNPs had low MAF globally (range 2-9%) but were enriched in the 1000 Genomes AMR (Admixed American superpopulation) population (range 15-23%). The top two SNPs are in high LD in some populations (r2: MXL (Mexican ancestry from Los Angeles) = 0.71, ASW (Americans of African Ancestry in the Southwest USA) = 1.0, PEL (Peruvian from Lima, Peru) = 0.92), but not globally. The third SNP was also in moderately high LD with the peak SNP in some populations (r2: MXL= 0.28, ASW = 0.83, PEL = 0.51). The fourth SNP, which was the variant that was previously identified in our LCH GWAS, has only moderate LD with the peak SNP in some populations (r2: MXL= 0.31, CLM (Colombian from Medellian, Colombia) = 0.3, PEL = 0.13), suggesting that these may be independent signals, and/or that specific SNPs contribute to LCH risk differentially based on ancestry. Conclusions: We identified additional support for associations of germline SMAD6 variants with LCH susceptibility. Our previous risk locus was again enriched among those who develop LCH, and a cluster of loci surrounding our risk locus was identified using this sequence-based approach. Next steps include functional annotation of top hits, analyses to assess whether top hits are associated with LCH patient or clinical characteristics (e.g., genomic ancestry, BRAF V600E mutation status, risk organ involvement), and burden analysis to investigate highly penetrant variants. Overall, this study suggests potential contributions of germline genomics in LCH pathogenesis that may identify a functional role for SMAD6 and account for different incidence rates according to race/ethnicity.
Read full abstract