Abstract
The Ion Proton sequencer from Thermo Fisher accurately determines sequence variants from target regions with a rapid turnaround time at a low cost. However, misleading variant-calling errors can occur. We performed a systematic evaluation and manual curation of read-level alignments for the 675 ultrarare variants reported by the Ion Proton sequencer from 27 whole-exome sequencing data but that are not present in either the 1000 Genomes Project and the Exome Aggregation Consortium. We classified positive variant calls into 393 highly likely false positives, 126 likely false positives, and 156 likely true positives, which comprised 58.2%, 18.7%, and 23.1% of the variants, respectively. We identified four distinct error patterns of variant calling that may be bioinformatically corrected when using different strategies: simplicity region, SNV cluster, peripheral sequence read, and base inversion. Local de novo assembly successfully corrected 201 (38.7%) of the 519 highly likely or likely false positives. We also demonstrate that the two sequencing kits from Thermo Fisher (the Ion PI Sequencing 200 kit V3 and the Ion PI Hi-Q kit) exhibit different error profiles across different error types. A refined calling algorithm with better polymerase may improve the performance of the Ion Proton sequencing platform.
Highlights
Several studies have used the Ion Proton sequencing platform from Thermo Fisher to identify causal variants in many genetic disorders [1,2,3]
In the present study we investigated results of variant calling from 27 whole-exome sequencing (WES) data generated by the Ion Proton sequencer
This process yielded 675 loci that were manually investigated to determine the correctness of variant calls, and these ultrarare variants were evenly distributed in all of the samples (194.3± 14.5, range = 163–224)
Summary
Several studies have used the Ion Proton sequencing platform from Thermo Fisher to identify causal variants in many genetic disorders [1,2,3]. The Ion Proton sequencer can reportedly achieve a rapid turnaround at a low cost with high accuracy in the genotype calls of targeted genomic regions [4]. Systematic errors are introduced in sequencing data due to the use of the polymerase reaction or mapper-calling algorithms, especially in homopolymer-rich and high-AT-content regions [5]. Understanding sequence data characteristics has become an essential prerequisite when attempting to discover genuine variants [6]. In the present study we investigated results of variant calling from 27 whole-exome sequencing (WES) data generated by the Ion Proton sequencer.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.