Abstract

We consider the design and evaluation of short barcodes, with a length between six and eight nucleotides, used for parallel sequencing on platforms where substitution errors dominate. Such codes should have not only good error correction properties but also the code words should fulfil certain biological constraints (experimental parameters). We compare published barcodes with codes obtained by two new constructions methods, one based on the currently best known linear codes and a simple randomized construction method. The evaluation done is with respect to the error correction capabilities, barcode size and their experimental parameters and fundamental bounds on the code size and their distance properties. We provide a list of codes for lengths between six and eight nucleotides, where for length eight, two substitution errors can be corrected. In fact, no code with larger minimum distance can exist.

Highlights

  • Modern high-throughput techniques for DNA sequencing allow to sequence RNA of different independent samples during a single run

  • The reads obtained by the sequencing procedure can be demultiplexed afterwards, i.e., they are assigned to the different samples

  • We formally describe the channel by the probability to obtain a received word r given that a code word c was chosen, i.e., with P(rDc)~ PP

Read more

Summary

Introduction

Modern high-throughput techniques for DNA sequencing allow to sequence RNA of different independent samples during a single run. For this purpose, the cDNA molecules of each sample are tagged with a unique sequence, the code word, and pooled into one single library [1]. Due to errors occurring during the library preparation and the sequencing process a cross-talk event may occur, where reads are assigned to the wrong sample. This is especially of importance when a gene is very differently transcribed between two samples. Biased GC or long homopolymers increase the error rates in the enzymatic processes used

Methods
Results
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.