Abstract

BackgroundMultiplex polymerase chain reaction (PCR) is a common enrichment technique for targeted massive parallel sequencing (MPS) protocols. MPS is widely used in biomedical research and clinical diagnostics as the fast and accurate tool for the detection of short genetic variations. However, identification of larger variations such as structure variants and copy number variations (CNV) is still being a challenge for targeted MPS. Some approaches and tools for structural variants detection were proposed, but they have limitations and often require datasets of certain type, size and expected number of amplicons affected by CNVs. In the paper, we describe novel algorithm for high-resolution germinal CNV detection in the PCR-enriched targeted sequencing data and present accompanying tool.ResultsWe have developed a machine learning algorithm for the detection of large duplications and deletions in the targeted sequencing data generated with PCR-based enrichment step. We have performed verification studies and established the algorithm’s sensitivity and specificity. We have compared developed tool with other available methods applicable for the described data and revealed its higher performance.ConclusionWe showed that our method has high specificity and sensitivity for high-resolution copy number detection in targeted sequencing data using large cohort of samples.Electronic supplementary materialThe online version of this article (doi:10.1186/s12859-016-1272-6) contains supplementary material, which is available to authorized users.

Highlights

  • Multiplex polymerase chain reaction (PCR) is a common enrichment technique for targeted massive parallel sequencing (MPS) protocols

  • Sequencing data from healthy individuals and patients diagnosed with cystic fibrosis, phenylketonuria and galactosemia were obtained from Parseq Lab biobank

  • We revealed that the variance of amplicon relations varies greatly from experiment to experiment that makes copy number variations (CNV) detection within set of samples sequenced in one run is more effective we analyzed each dataset separately

Read more

Summary

Introduction

Multiplex polymerase chain reaction (PCR) is a common enrichment technique for targeted massive parallel sequencing (MPS) protocols. Some approaches and tools for structural variants detection were proposed, but they have limitations and often require datasets of certain type, size and expected number of amplicons affected by CNVs. In the paper, we describe novel algorithm for high-resolution germinal CNV detection in the PCR-enriched targeted sequencing data and present accompanying tool. Variability in efficiency of amplification during library preparation leads to uneven amplicon coverage from one experiment to another. This limits the usage of existing coverage-based CNV detection tools for a TS data. Well-known paired-end algorithms that use insert size and reads’ orientation are unapplicable for analysis of data produced with amplification-based sample preparation techniques [2, 3]. Several approaches for CNV detection are used in clinical diagnostics, but most of them use WGS or WES data and cannot be applied for amplificationbased TS [8, 9]

Methods
Results
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.