There remain challenges in accurately identifying constitutional or germline copy number variants (gCNVs) based on whole-exome sequencing data that have implications for genetic diagnosis for ‘rare undiagnosed disease’ in the clinical setting. Although multiple algorithms have been proposed, a systematic comparison of these algorithms for calling gCNVs and analyzing inherited pattern have yet to be fully conducted. Therefore, we empirically compared seven exome-based algorithms, including XHMM, CLAMMS, CODEX2, ExomeDepth, DECoN, CN.MOPS, and GATK gCNV, for calling gCNVs in 151 individuals from 44 pedigrees, together with the gold standard of genotyping-derived gCNVs in the same cohort for the performance assessment. These algorithms demonstrated varied powers in identifying gCNVs, although the distribution of gCNVs size was similar. The number of shared gCNVs across these algorithms was limited (e.g., only four gCNVs shared among seven algorithms); however, several algorithms showed varying degrees of consistency (e.g., 1,843 gCNVs shared between DECoN and ExomeDepth). CLAMMS and CODEX2 outperformed the remaining algorithms according to a relatively higher F-score (i.e., 0.145 and 0.152, respectively). In addition, these algorithms exhibited different Mendelian inconsistencies of gCNVs and significant challenges remained in inheritance pattern analysis. In conclusion, selecting good algorithms may have important implications in gCNVs-based inheritance pattern analysis for family-based studies.
Read full abstract