Abstract

Bioinformatics tools for analyzing copy number variants (CNVs) from massively parallel sequencing (MPS) data are less well developed compared with other variant types. We present an efficient bioinformatics pipeline for CNV detection from gene panel MPS data in neuromuscular disorders. CNVs were generated in silico into samples sequenced with a previously published MPS gene panel. The in silico CNVs from these samples were analyzed with four programs having complementary CNV detection ranges: CoNIFER, XHMM, ExomeDepth, and CODEX. A logistic regression model was trained with the obtained set of in silico CNV detections to predict true-positive CNV detections among all CNV detections from samples. This model was validated using 66 control samples with a verified true-positive (n=58) or false-positive (n=8) CNV detection. Applying all four programs together provided more sensitive detection results with in silico CNVs than other program combinations or any program alone. Furthermore, a model with CNV detection-specific scores from all four programs as variables performed overall best in the validation. No single program could detect all CNV sizes and types equally or with enough accuracy. Therefore, a combination of carefully selected programs should be used to maximize detection accuracy. In addition, the detected CNVs should be reviewed with a statistical model to streamline and standardize the filtering of the detections for annotation.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.