Abstract

Variable number tandem repeats (VNTRs) are composed of consecutive repetitive DNA with hypervariable repeat count and composition. They include protein coding sequences and associations with clinical disorders. It has been difficult to incorporate VNTR analysis in disease studies that use short-read sequencing because the traditional approach of mapping to the human reference is less effective for repetitive and divergent sequences. In this work, we solve VNTR mapping for short reads with a repeat-pangenome graph (RPGG), a data structure that encodes both the population diversity and repeat structure of VNTR loci from multiple haplotype-resolved assemblies. We develop software to build a RPGG, and use the RPGG to estimate VNTR composition with short reads. We use this to discover VNTRs with length stratified by continental population, and expression quantitative trait loci, indicating that RPGG analysis of VNTRs will be critical for future studies of diversity and disease.

Highlights

  • Variable number tandem repeats (VNTRs) are composed of consecutive repetitive DNA with hypervariable repeat count and composition

  • Tandem Repeat Genotyping based on Haplotype-derived Pangenome Graphs to identify VNTR boundaries in assemblies, construct repeat-pangenome graph (RPGG), align short-read sequencing (SRS) reads to the RPGG, and infer VNTR motif composition and length in SRS samples

  • We developed a pipeline that partitions long-read sequencing (LRS) reads by haplotype based on phased heterozygous single-nucleotide variant (SNV) and assembles haplotypes separately by chromosome

Read more

Summary

Introduction

Variable number tandem repeats (VNTRs) are composed of consecutive repetitive DNA with hypervariable repeat count and composition. They include protein coding sequences and associations with clinical disorders. We develop software to build a RPGG, and use the RPGG to estimate VNTR composition with short reads We use this to discover VNTRs with length stratified by continental population, and expression quantitative trait loci, indicating that RPGG analysis of VNTRs will be critical for future studies of diversity and disease. Graph analysis has been used to encode the elementary duplication structure of a genome[29] and for multiple-sequence alignment of repetitive sequences with shuffled domains[30], making them well-suited to represent VNTRs that differ in both repeat count and composition

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call