Abstract

A recent refinement in high-throughput sequencing involves the incorporation of unique molecular identifiers (UMIs), which are random oligonucleotide barcodes, on the library preparation steps. A UMI adds a unique identity to different DNA/RNA input molecules through polymerase chain reaction (PCR) amplification, thus reducing bias of this step. Here, we propose an alignment free framework serving as a preprocessing step of fastq files, called UMIc, for deduplication and correction of reads building consensus sequences from each UMI. Our approach takes into account the frequency and the Phred quality of nucleotides and the distances between the UMIs and the actual sequences. We have tested the tool using different scenarios of UMI-tagged library data, having in mind the aspect of a wide application. UMIc is an open-source tool implemented in R and is freely available from https://github.com/BiodataAnalysisGroup/UMIc.

Highlights

  • The introduction of next-generation sequencing (NGS) has revolutionized genomic research and has impacted tremendously clinical applications (Lander et al, 2001; Shen et al, 2015)

  • We propose an R-based framework, called UMIc, which is a preprocessing step of the raw fastq files based on an alignment-free method

  • By adding a random unique molecular identifiers (UMIs) in each read, it is possible to exclude duplicates based on the unique UMIs

Read more

Summary

Introduction

The introduction of next-generation sequencing (NGS) has revolutionized genomic research and has impacted tremendously clinical applications (Lander et al, 2001; Shen et al, 2015). The detection of true mutants in low-frequency alleles or rare subclones that may contribute to the disease at an early stage remains a big challenge for cancer studies. This is mainly due to the NGS library preparation process, which includes multiple rounds of polymerase chain reaction (PCR) amplification, introducing PCR duplicates and artifacts in the output sequence. This limitation was overcome by the use of unique molecular identifiers (UMIs), facilitating detection and removal of PCR duplicates.

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call