Abstract

Accurate detection and genotyping of structural variations (SVs) from short-read data is a long-standing area of development in genomics research and clinical sequencing pipelines. We introduce Paragraph, an accurate genotyper that models SVs using sequence graphs and SV annotations. We demonstrate the accuracy of Paragraph on whole-genome sequence data from three samples using long-read SV calls as the truth set, and then apply Paragraph at scale to a cohort of 100 short-read sequenced samples of diverse ancestry. Our analysis shows that Paragraph has better accuracy than other existing genotypers and can be applied to population-scale studies.

Highlights

  • Structural variants (SVs) contribute to a large fraction of genomic variation and have long been implicated in phenotypic diversity and human disease [1,2,3]

  • Recent evidence suggests that a significant fraction of novel SVs could be tandem repeats with variable lengths across the population [31, 32], and we found that 49% of the singleton unique SVs are completely within the UCSC Genome Browser Tandem Repeat (TR) tracks while 93% of the clustered unique SVs are within TR tracks

  • Genotyping in tandem repeats We identified that most of the SVs having breakpoint deviations between the contiguous long reads (CLR) calls and long-read ground truth (LRGT) are in lowcomplexity regions: of the 8069 matching SVs with breakpoint deviations, 3217 (77%) are within TRs

Read more

Summary

METHOD

Paragraph: a graph-based structural variant genotyper for short-read sequence data. Sai Chen1†, Peter Krusche2,3†, Egor Dolzhenko, Rachel M.

Background
Result
Discussion
Conclusions
Findings
Methods
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call