Abstract Background and Aims The human genome includes tandem repeats with variable length (VNTR) and a subset of these repeats have been associated with rare human diseases. Specific frameshift variants in the coding-VNTR of the MUC1 gene cause autosomal dominant tubulointerstitial kidney disease—MUC1 (ADTKD-MUC1). Calling variants from VNTR using short-read sequencing (SRS) is challenging due to poor read mappability, motif complexity (34*60-mer motifs are known up to now), variable repetition, and enormous motif sequence similarity. We have recently developed a computational pipeline called VNtyper, tailored for the precise detection of disease-causing variants within the MUC1 VNTR region using short-read sequencing data. This advancement allowed us to identify overlooked cases in a hereditary renal disease registry, leading to the diagnosis of at least 40 patients. Given that regular exome sequencing with low VNTR coverage proved inefficient for VNtyper, our focus shifted to boost the capture of MUC1 VNTR in exome. This improvement aims to enable the application of VNtyper in ADTKD-MUC1 diagnosis through exome sequencing. Method VNtyper utilizes two independent genotyping algorithms (Kestrel and code-adVNTR) along with MUC1 VNTR-specific reference sequence for the variant detection and is sensitive to the VNTR coverage. We employ Twist custom panels with v1 protocol for target enrichment in both our panel and exome sequencing. This procedure allows for the inclusion of spike-in probes without interfering with other targets. We designed an NTI panel that specifically targets 6 genes associated with ADTKD, including UMOD, MUC1, HNF1B, REN, SEC61A1, and DNAJB11. These genes were captured using a 1x tiling approach, whereas the MUC1 VNTR region was captured with 4x tiling approach. During the exome target enrichment process, we included the NTI probes as a spike-in to boost VNTR coverage. We conducted a routine exome sequencing on a total of 6 samples, consisting of 3 positive controls and 3 negative controls. Additionally, we conducted 16 boosted exome, using 3 negative controls and 13 MUC1 positive samples. Prior to using VNtyper on both exomes, we conducted initial quality controls to verify that there were no alterations in the coverages of other genes. Results The mean coverage of MUC1 VNTR in the regular exome was 72.1x, while in the boosted exome, it exhibited a significant increase to 144x. Adding spike in probes did not have significant effect on the coverage of the contig and there was no alteration in ratio of heterozygosity. As anticipated, in the regular exome, VNtyper was unsuccessful in identifying pathogenic variation in 2 out of 3 true positives. However, all negative controls tested negative. When we used the enhanced exome, we accurately identified all 13 true positive cases, while the negative control cases stayed negative. Through downsampling experiments (reducing read depth from 50% to 5% of the total) on panel data with a mean coverage of 700x, we determined that coverage below 100 should be considered as low coverage for VNTR genotyping using our pipeline. This underscores the critical importance of maintaining adequate coverage for accurate and reliable VNTR genotyping. Conclusion We have achieved successful enhancements in the accuracy and sensitivity of ADTKD diagnosis through the application of VNtyper on clinically boosted exome data. While VNtyper could identify MUC1-positive patients in regular exome data, the effectiveness is influenced by the number of repeats in both alleles and the specific motif in which the variation is present, which significantly decreases the sensitivity. In summary, the VNtyper pipeline demonstrates its ability to detect pathogenic variations in ADTKD-MUC1 on panel data (100% sensitivity), and the improvement in VNTR capture in exome significantly enhances the sensitivity of the ADTKD diagnosis.
Read full abstract