Abstract

Bioinformatics tools for analyzing copy number variants (CNVs) from massively parallel sequencing (MPS) data are less well developed compared with other variant types. We present an efficient bioinformatics pipeline for CNV detection from gene panel MPS data in neuromuscular disorders. CNVs were generated in silico into samples sequenced with a previously published MPS gene panel. The in silico CNVs from these samples were analyzed with four programs having complementary CNV detection ranges: CoNIFER, XHMM, ExomeDepth, and CODEX. A logistic regression model was trained with the obtained set of in silico CNV detections to predict true-positive CNV detections among all CNV detections from samples. This model was validated using 66 control samples with a verified true-positive (n=58) or false-positive (n=8) CNV detection. Applying all four programs together provided more sensitive detection results with in silico CNVs than other program combinations or any program alone. Furthermore, a model with CNV detection-specific scores from all four programs as variables performed overall best in the validation. No single program could detect all CNV sizes and types equally or with enough accuracy. Therefore, a combination of carefully selected programs should be used to maximize detection accuracy. In addition, the detected CNVs should be reviewed with a statistical model to streamline and standardize the filtering of the detections for annotation.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call