Lack of reliable reference samples containing different mutations of interest across large sets of disease-relevant loci limits the extensive validation clinical next-generation sequencing (NGS) assays and their associated bioinformatics pipelines. Herein, we have generated a publicly available, highly flexible tool, in silico Mutator (insiM), to introduce point mutations, insertions, deletions, and duplications of any size into real data sets of amplicon-based or hybrid-capture NGS assays. insiM accepts an alignment file along with target territory and produces paired-end FASTQ files containing specified mutations via modification of original sequencing reads. Mutant signal is, thus, generated within the context of existing real-world data to most closely mimic assay performance. Resulting files may then be passed through the assay's bioinformatics pipeline to assist with assay/bioinformatics validation and to identify performance gaps in detection. To establish the basic functionality of the software, a series of simulation experiments with varying mutation types, sizes, and allele frequencies were performed across the entire clinical territory of hybrid-capture and amplicon-based clinical assays developed at The University of Chicago. This work demonstrates the utility of insiM as a supplementary tool during the validation of an NGS assay's bioinformatics pipeline.
Read full abstract