Abstract

BackgroundCurrent high-throughput sequencing platforms provide capacity to sequence multiple samples in parallel. Different samples are labeled by attaching a short sample specific nucleotide sequence, barcode, to each DNA molecule prior pooling them into a mix containing a number of libraries to be sequenced simultaneously. After sequencing, the samples are binned by identifying the barcode sequence within each sequence read.In order to tolerate sequencing errors, barcodes should be sufficiently apart from each other in sequence space. An additional constraint due to both nucleotide usage and basecalling accuracy is that the proportion of different nucleotides should be in balance in each barcode position. The number of samples to be mixed in each sequencing run may vary and this introduces a problem how to select the best subset of available barcodes at sequencing core facility for each sequencing run. There are plenty of tools available for de novo barcode design, but they are not suitable for subset selection.ResultsWe have developed a tool which can be used for three different tasks: 1) selecting an optimal barcode set from a larger set of candidates, 2) checking the compatibility of user-defined set of barcodes, e.g. whether two or more libraries with existing barcodes can be combined in a single sequencing pool, and 3) augmenting an existing set of barcodes.In our approach the selection process is formulated as a minimization problem. We define the cost function and a set of constraints and use integer programming to solve the resulting combinatorial problem. Based on the desired number of barcodes to be selected and the set of candidate sequences given by user, the necessary constraints are automatically generated and the optimal solution can be found. The method is implemented in C programming language and web interface is available at http://ekhidna2.biocenter.helsinki.fi/barcosel.ConclusionsIncreasing capacity of sequencing platforms raises the challenge of mixing barcodes. Our method allows the user to select a given number of barcodes among the larger existing barcode set so that both sequencing errors are tolerated and the nucleotide balance is optimized. The tool is easy to access via web browser.

Highlights

  • Current high-throughput sequencing platforms provide capacity to sequence multiple samples in parallel

  • We have applied our method to a barcode set consisting of 288 candidate sequences having a length of 8 nucleotides

  • The best possible position-wise nucleotide balance depends on the number of barcodes to be selected

Read more

Summary

Introduction

Current high-throughput sequencing platforms provide capacity to sequence multiple samples in parallel. The number of samples to be mixed in each sequencing run may vary and this introduces a problem how to select the best subset of available barcodes at sequencing core facility for each sequencing run. It is a common practice to pool several samples together in order to maximize the usage of the capacity of highthroughput sequencing platforms. In order to tolerate a single nucleotide mismatch in barcode detection, different barcode sequences should be at least three nucleotide mismatches apart from each other. In Illumina sequencers, the nucleotides are detected using two lasers, red laser for A/C and green laser for G/T. For optimal detection, these two nucleotide groups should be in balance between all barcodes in each barcode position. Besides nucleotide diversity being important in cluster identification, obtaining a good nucleotide balance is important for successful basecalling to be performed

Methods
Results
Discussion
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.