Abstract Lung cancer (LC) is the primary cause of cancer-related deaths in the United States. Whereas smoking and other environmental factors strongly increase LC risk, multiple genetic variants also contribute to risk in smokers. Furthermore, among smokers, some families have been identified with an abnormally high prevalence of LC, suggesting that unknown genetic factors can greatly increase LC risk in smokers. The typically short survival after LC diagnosis impedes collection of detailed genotypic information on any single large family pedigree, impairing the identification of putative high-risk factors. Therefore, the Genetic Epidemiology of LC Consortium has collected epidemiological and genetic data from a number of families with high numbers of LC cases from eight different sites across the US. In this study, we have obtained whole exome sequences (WES) from 290 members of 28 families, including 66 LC cases. We used a gene-based approach to allow for the possibility that different families may contain different variants of the same gene. Variants were filtered for i) allele frequency, ii) functional effect using combined annotation-dependent depletion (CADD), and, iii) affecting a gene with either a known or suspected role in cancer. We further selected variants based on their segregation in family pedigrees in a pattern consistent with a large effect on LC risk. Candidate LC risk genes were then identified as those represented in at least two families by the same or different variants. We further culled the list of genes by requiring the presence of at least one rare, functional variant enriched in the WES of 1060 cases relative to 899 controls from the Transdisciplinary Research on Cancer of the Lung consortium. This analysis narrowed our results to two genes, one being E2A, a member of the E family of bHLH transcription factors. Whereas loss-of-function mutations in E2A drive lymphoid cancers, the E2A protein also participates in an oncogenic heterodimer with TWIST1 that promotes the epithelial-mesenchymal transition and is implicated in multiple cancer types. Furthermore, the E2A/TWIST1 heterodimer is the primary TWIST1-containing complex implicated in oncogenesis, and silencing of E2A in KRAS-mutant non-small cell lung cancer (NSCLC) cell lines results in oncogene-stimulated senescence and apoptosis. Our data identified three distinct E2A variants present in all 10 sequenced LC cases in the 5 families in which those variants are found. Finally, two of these E2A variants are located only 57 nucleotides from each other, immediately adjacent to sequences encoding a transcription activation domain, suggesting that both variants alter the same specific protein function. These data identify E2A variants as likely high risk factors for LC in smokers and validate our general approach for identifying genetic factors with a large impact on LC risk. Citation Format: Claudio W. Pikielny, Anthony M. Musolf, Mariza de Andrade, Diptasri Mandal, Colette Gaba, Ping Yang, yafang Li, Ming You, Richard Wilson, Elena Y. Kupert, Marshall W. Anderson, Ann G. Schwartz, Susan M. Pinney, Ambrose I. Granizo-Mackenzie, Yanhong Liu, Ramaswamy Govindan, James McKay, Rayjean Hung, John K. Field, David C. Christiani, Joan E. Bailey-Wilson, Christopher I. Amos. Familial studies identify variants in the E2A transcription factor as putative risk factors for lung cancer [abstract]. In: Proceedings of the American Association for Cancer Research Annual Meeting 2019; 2019 Mar 29-Apr 3; Atlanta, GA. Philadelphia (PA): AACR; Cancer Res 2019;79(13 Suppl):Abstract nr LB-053.
Read full abstract