Blood typing is essential for safe transfusions and is performed serologically or genetically. Genotyping predominantly focuses on coding regions, but non-coding variants may affect gene regulation, as demonstrated in the ABO, FY and XG systems. To uncover regulatory loci, we expanded a recently developed bioinformatics pipeline for discovery of non-coding variants by including additional epigenetic datasets. Multiple datasets including ChIP-seq with erythroid transcription factors (TFs), histone modifications (H3K27ac, H3K4me1), and chromatin accessibility (ATAC-seq) were analyzed. Candidate regulatory regions were investigated for activity (luciferase assays) and TF binding (electrophoretic mobility shift assay, EMSA, and mass spectrometry, MS). In total, 814 potential regulatory sites in 47 blood-group-related genes were identified where one or more erythroid TFs bound. Enhancer candidates in CR1, EMP3, ABCB6, and ABCC4 indicated by ATAC-seq, histone markers, and co-occupancy of 4 TFs (GATA1/KLF1/RUNX1/NFE2) were investigated but only CR1 and ABCC4 showed increased transcription. Co-occupancy of GATA1 and KLF1 was observed in the KEL promoter, previously reported to contain GATA1 and Sp1 sites. TF binding energy scores decreased when three naturally occurring variants were introduced into GATA1 and KLF1 motifs. Two of three GATA1 sites and the KLF1 site were confirmed functionally. EMSA and MS demonstrated increased GATA1 and KLF1 binding to the wild-type compared to variant motifs. This combined bioinformatics and experimental approach revealed multiple candidate regulatory regions and predicted TF co-occupancy sites. The KEL promoter was characterized in detail, indicating that two adjacent GATA1 and KLF1 motifs are most crucial for transcription.
Read full abstract