Abstract
Long-read sequencing has been shown to have advantages in structural variation (SV) detection and methylation calling. Many studies focus either on SV, methylation, or phasing of SNV; however, only the combination of variants provides a comprehensive insight into the sample and thus enables novel findings in biology or medicine. PRINCESS is a structured workflow that takes raw sequence reads and generates a fully phased SNV, SV, and methylation call set within a few hours. PRINCESS achieves high accuracy and long phasing even on low coverage datasets and can resolve repetitive, complex medical relevant genes that often escape detection. PRINCESS is publicly available at https://github.com/MeHelmy/princess under the MIT license.
Highlights
Long-read sequencing (LRS) is becoming more broadly available across sequencing centers and smaller academic institutions [1]
PRINCESS consists of multiple stages including (i) initial data quality control, (ii) alignment of the reads, (iii) identification of SNVs and indels, (iv) identification of structural variation (SV), (v) filtering variants, and (vi) phasing of SNVs, indels, and SVs together and (vii) reporting of the results
To ease the use of PRINCESS, we have incorporated preset parameters to optimize the analysis of the three major long-read platforms/technologies being CLR, High Fidelity (HiFi) for PacBio, and Oxford Nanopore (ONT)
Summary
Long-read sequencing (LRS) is becoming more broadly available across sequencing centers and smaller academic institutions [1]. The detection of small variants (SNVs and indels) (typically 1–50 bp), SVs (50+ bp: deletions, duplications, insertions, inversions, and translocations), and methylation differences provide important insights into genomics and genetics [20,21,22]. Each of these genomic variations/alterations have been shown to be important drivers of evolution, diversity, and diseases or phenotypic changes [6, 23, 24]. We highlight PRINCESS’s capability to improve variant identification across 193 medical regions that are difficult to assess with short-read technology [38] that often escapes NGS sequencing [38]
Published Version (Free)
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.