Abstract

Background: Tandem repeats (TRs) are highly prone to variation in copy numbers due to their repetitive and unstable nature, which makes them a major source of genomic variation between individuals. However, population variation of TRs has not been widely explored due to the limitations of existing approaches, which are either low-throughput or restricted to a small subset of TRs. Here, we demonstrate a targeted sequencing approach combined with Nanopore sequencing to overcome these limitations. Methods: We selected 142 TR targets and enriched these regions using Agilent SureSelect target enrichment approach with only 200 ng of input DNA. We barcoded the enriched products and sequenced on Oxford Nanopore MinION sequencer. We used VNTRTyper and Tandem-genotypes to genotype TRs from long-read sequencing data. Gold standard PCR sizing analysis was used to validate genotyping results from targeted sequencing data. Results: We achieved an average of 3062-fold target enrichment on a panel of 142 TR loci, generating an average of 97X coverage per sample with 200 ng of input DNA per sample. We successfully genotyped an average of 75% targets and genotyping rate increased to 91% for the highest-coverage sample for targets with length less than 2 kb, and GC content greater than 25%. Alleles estimated from targeted long-read sequencing were concordant with gold standard PCR sizing analysis and highly correlated with alleles estimated from whole genome long-read sequencing. Conclusions: We demonstrate a targeted long-read sequencing approach that enables simultaneous analysis of hundreds of TRs and accuracy is comparable to PCR sizing analysis. Our approach is feasible to scale for more targets and more samples facilitating large-scale analysis of TRs.

Highlights

  • Repeated sequences occur in multiple copies throughout the genome; they make up almost half of the human genome1

  • Tandem repeats (TRs) can be further divided into two types based on the length of the repeat unit; repeats with one to six base pair repeat units are classified as microsatellites or short tandem repeats (STRs) and those with more than six base pair repeat units are known as minisatellites4

  • This is the first report on genotyping analysis of hundreds of TRs using targeted long-read sequencing approach

Read more

Summary

Introduction

Repeated sequences occur in multiple copies throughout the genome; they make up almost half of the human genome. We demonstrate the targeted sequence capture of repetitive TRs using Oxford Nanopore long-read sequencing technologies. PCR analysis of VNTRs A total of 10 targeted VNTR regions which are less than 1 kb in repetitive sequence were validated by PCR sizing analysis in this study (PCR primer sequences provided in Extended data, Supplementary Table 1)30 These ten targets include various repeat unit length and repeat sequence combinations to assess the accuracy of the genotypes determined from sequencing data. The majority of these targets were tested in our previous study and the results from the previous PCR analysis were used for these regions. Genotype rate was calculated as the proportion of sample, target pairs which had a predicted genotyped (based on VNTRtyper) amongst all targets which met the GC and repeat length thresholds

Results
Sample Method
Discussion
Hannan AJ: TRPing up the genome
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call