Abstract
Base-calling accuracy is crucial for high-throughput DNA sequencing and downstream analysis such as read mapping and genome assembly. Accordingly, we made an endeavor to reduce DNA sequencing errors of Illumina systems by correcting three kinds of crosstalk in the cluster intensity data. We discovered that signal crosstalk between adjacent clusters accounts for a large portion of sequencing errors in Illumina systems, even after correcting color crosstalk caused by the overlap of dye emission spectra and phasing/pre-phasing caused by out-of-step nucleotide synthesis. Interestingly and importantly, spatial crosstalk between adjacent clusters is cluster-specific and often asymmetric, which cannot be corrected by existing deconvolution methods. Therefore, we introduce a novel mathematical method able to estimate and remove spatial crosstalk, thereby reducing base-calling errors by 44–69% at a given mapping rate from Illumina systems. Furthermore, the resolution gained from this work provides new room for higher throughput of DNA sequencing and of general measurement systems using fluorescence-based imaging technology. The resulting base-caller 3Dec is available for academic users at http://github.com/flishwnag/3dec. Not only does it reduce 62.1% errors compared to the standard pipeline, but also its implementation is fast enough for daily sequencing.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.