Application of Next-Generation Sequencing (NGS) for pharmacogenetic analysis provides cost-effective and high-throughput alternatives to other genotyping methodologies, making it a method of choice for pharmacogenetic testing. However, the nature of short-read sequencing technologies may carry challenges when it comes to the detection of specific complex structures/variants such as large insertion/deletions (indels), or repeat polymorphisms. These limitations can be overcome by the design of sequencing probes and optimized NGS library preparation process, as well as customized bioinformatic analysis pipelines. UGT1A1 is an important pharmacogene that impacts response to multiple drugs, including the HIV treatment, Atazanavir. The TA repeat polymorphism, rs3064744, is identified across various geographic, ethnic and racial populations and causes reduction in gene activity, resulting in toxicity and dosage effects. NUDT15 plays an important role in the metabolism of thiopurines. The rs746071566 microsatellite polymorphism, c.38GAGTCG[4] (p.13GV[4]), plays a critical role in thiopurine-induced adverse reactions and has been identified as a commonly observed variant in pan-ethnic populations. The current study evaluated the performance of a targeted NGS capture and bioinformatics platform to identify the promoter (TA)7 repeat polymorphism UDP glucuronosyl-transferase UGT1A1, as well as a 6 bp insertion NUDT15 variant. Both variants were analyzed as a part of a 455 gene panel utilized for a preventative genomics assay. The study consisted of 413 validation samples analyzed for UGT1A1 (rs3064744), and 186 analyzed for NUDT15 (rs746071566). The specimens were retrieved from various sources: DNA from Coriell Repository, and DNA isolated from whole blood and saliva by the Maxwell RSC Blood DNA Kit (Promega, Madison, Wisconsin) on the automated Maxwell RSC 48 Instrument (Promega). Sequencing was performed using custom-designed targeted capture and Fast Hybridization Target Enrichment workflow (Twist Bioscience, South San Francisco, CA). The samples were prepared using 50 ng of input DNA and the Twist EF Library Prep Kit 2.0 + UDIs and Fast Hyb reagents (Twist Bioscience), and run in a format of 12 samples per pool and 36 per single run on MiniSeq System (Illumina, San Diego, CA). Selected samples were confirmed using Sanger sequencing. The data was analyzed using the Edico Dragen server (Illumina), and in-house developed proprietary pipeline. For the purpose of this study, the reference ranges were defined for UGT1A1 as *1 AT[7], *28 AT[8], *36 AT[6] and *39 AT[9]. The genotypes frequency detected for UGT1A1 rs3064744 were as follow: AT[7]/AT[6] 0.97%, AT[7]/AT[7] 62%, AT[7]/AT[8] 24%, AT[7]/AT[9] 0.97%, AT[8]/AT[8] 11.8%, AT[9]/AT[9] 0.24%; and for NUDT15 rs746071566 GGAGTC[1]/GGAGTC[1] 99.5%, GGAGTC[1]/GGAGTC[2] 0.5% and GGAGTC[1]/GGAGTC[0] 0%. This data is comparable to the frequency published in the gnomAD database. Additionally, Sanger confirmation analysis performed for selected Coriell samples yielded 100% concordance between NGS and Sanger sequencing data. As previously reported, ∼14% of variants (eg, indels, small CNVs, and complex changes) identified during NGS analysis can impose technical difficulties resulting in decreased detection rates. Both of the variants investigated in this study (UGT1A1 rs3064744 and NUDT15 rs746071566) belong to this group of challenging variants, and thus require specific consideration during the analytical and bioinformatic design and validation. The NGS capture presented in this study was iteratively designed to optimize the ability to sequence across these regions. Performance data for these variants is dependent on wet laboratory procedure and bioinformatic variant assembly because standard VCF calling software packages do not account for repetitive elements. The in-house developed bioinformatic script accurately translated the output of the VCF pipeline into the genotype calls. The comparability of the obtained genotype frequencies with the published data in gnomAD further confirmed the accuracy of the assay performance. Analysis of the samples using Sanger sequencing further confirmed data received on NGS analysis.
Read full abstract