Abstract

Recent advancements in next-generation sequencing have rapidly improved our ability to study genomic material at an unprecedented scale. Despite substantial improvements in sequencing technologies, errors present in the data still risk confounding downstream analysis and limiting the applicability of sequencing technologies in clinical tools. Computational error-correction promises to eliminate sequencing errors, but the relative accuracy of error correction algorithms remains unknown. In this paper, we evaluate errorcorrection algorithms' ability to fix errors across different types of datasets that contain various levels of heterogeneity. We perform a realistic evaluation of several error correction tools. To measure the efficacy of these techniques, we apply the UMI-based high-fidelity sequencing protocol to eliminate sequencing errors from both simulated data and the raw reads. In terms of accuracy, we find that method performance varies substantially across different types of datasets with no single method performing best on all types of examined data. We also identify the techniques that offer a good balance between precision and sensitivity. This highlight showcases our paper's main findings [1], showing the advantages and limitations of computational error correction techniques across different domains of biology, including immunogenomics and virology.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call