BACKGROUNDThe severity of sickle cell anemia (SCA) has been associated with five specific haplotypes in the beta globin cluster, identifiable by distinct patterns of restriction fragment length polymorphism (RFLP). These RFLP-defined haplotypes, named according to the region where they were first discovered - Central Africa region (CAR), Benin, Senegal, Arabic-Indian, and Cameroon - suggested that the sickle cell mutation arose recently at least four independent times in Africa, each time on a different genetic background, and again in India - the Multi-centric Sickle Cell Model. The beta-globin locus, however, is prone to extensive structural rearrangements, and alongside observations of ancestral sequence haplotypes extending several hundreds of kilobases beyond the beta globin cluster, it has been suggested that there is more cis-variation in the beta-globin gene cluster than is evident from RFLP analyses, and current definitions of RFLP haplotypes may not accurately reflect the ancestral origin of the sickle mutation. This undescribed cis-variation is important to understanding inter-individual phenotype variation associated with the different haplotypes.To better define this variation, we have undertaken extended, long-range molecular haplotyping of the beta-globin cluster on differing RFLP haplotype backgrounds using real time single molecule sequencing (SMS). The SMS approach generates long, unbroken reads with uniform coverage, thereby allowing for detailed molecular phasing of RFLP haplotypes and identification of local structural variation that would otherwise be hidden within the complex repetitive elements of the beta-globin cluster. In so doing, we plan to evaluate the molecular evidence for a single origin for the sickle cell allele and identify potential cis-acting, disease-modifying, candidate variants within the beta globin cluster.METHODSDNA from 200 SCA patients from three countries in sub-Saharan Africa - Nigeria, Cameroon, and Kenya - were collected after informed consent. RFLP analysis revealed the majority of the patients to be homozygous for the Benin and CAR haplotypes, with a smaller sampling of Cameroon, Senegal, and Atypical haplotypes. From this group we selected 40 samples - representing the four main classical RFLP haplotypes in sub-Saharan Africa - were selected for SMS. PCR was used to tile barcoded amplicons across the cluster. SMS was performed on a Sequel machine (Pacific Biosciences). The resulting FASTQ sequences were de-multiplexed, read quality control filters were applied, and mapped to human reference genome Hg38, prior to annotation and calling of single nucleotide variants (SNV) and structural variants (SV).RESULTSAnalyses have so far identified 227 insertions, 18 deletions, 7 duplications, 76 inversions and 13 translocations from long-read analyses of the initial 40 samples; the vast majority of these variants are not described in public variant databases. Atypical haplotypes demonstrated more translocation and duplication events than other haplotypes. Insertions ~48 kb upstream of the beta-globin locus control region and 1.3 kb upstream of gamma globin gene 2 (HBG2) were observed on 50% of Senegal and 45% of CAR haplotypes. Also, recurrent insertions ranging between 32bp and 1,184bp in length, 2.3 kb downstream of the beta globin gene (HBB), were seen on 50% of the Benin and 54% of the CAR haplotypes, but only once on Senegal haplotypes.CONCLUSIONSOur findings suggest there are more cis-SVs within the beta-globin gene cluster than previously described. Work to validate these SVs using orthogonal methods and complete similar analyses of SNVs is ongoing. Variation outside of the gene cluster will also be integrated with RFLP and within-cluster molecular haplotypes to investigate the ancestral origin of the mutation, and validated cis-acting variants will be used to identify markers associated with proxies of SCA disease modification. DisclosuresNo relevant conflicts of interest to declare.
Read full abstract