CYP2A6 metabolically inactivates nicotine. Faster CYP2A6 activity is associated with heavier smoking and higher lung cancer risk. The CYP2A6 gene is polymorphic, including functional structural variants (SV) such as gene deletions (CYP2A6*4), duplications (CYP2A6*1 × 2), and hybrids with the CYP2A7 pseudogene (CYP2A6*12, CYP2A6*34). SVs are challenging to genotype due to their complex genetic architecture. Our aims were to develop a reliable protocol for SV genotyping, functionally phenotype known and novel SVs, and investigate the feasibility of CYP2A6 SV imputation from SNP array data in two ancestry populations. European- (EUR; n = 935) and African- (AFR; n = 964) ancestry individuals from smoking cessation trials were genotyped for SNPs using an Illumina array and for CYP2A6 SVs using Taqman copy number (CN) assays. SV-specific PCR amplification and Sanger sequencing was used to characterize a novel SV. Individuals with SVs were phenotyped using the nicotine metabolite ratio, a biomarker of CYP2A6 activity. SV diplotype and SNP array data were integrated and phased to generate ancestry-specific SV reference panels. Leave-one-out cross-validation was used to investigate the feasibility of CYP2A6 SV imputation. A minimal protocol requiring three Taqman CN assays for CYP2A6 SV genotyping was developed and known SV associations with activity were replicated. The first domain swap CYP2A6-CYP2A7 hybrid SV, CYP2A6*53, was identified, sequenced, and associated with lower CYP2A6 activity. In both EURs and AFRs, most SV alleles were identified using imputation (>70% and >60%, respectively); importantly, false positive rates were <1%. These results confirm that CYP2A6 SV imputation can identify most SV alleles, including a novel SV.
Read full abstract