Abstract

Sequencing microRNA, reduced representation sequencing, Hi-C technology and any method requiring the use of in-house barcodes result in sequencing libraries with low initial sequence diversity. Sequencing such data on the Illumina platform typically produces low quality data due to the limitations of the Illumina cluster calling algorithm. Moreover, even in the case of diverse samples, these limitations are causing substantial inaccuracies in multiplexed sample assignment (sample bleeding). Such inaccuracies are unacceptable in clinical applications, and in some other fields (e.g. detection of rare variants). Here, we discuss how both problems with quality of low-diversity samples and sample bleeding are caused by incorrect detection of clusters on the flowcell during initial sequencing cycles. We propose simple software modifications (Long Template Protocol) that overcome this problem. We present experimental results showing that our Long Template Protocol remarkably increases data quality for low diversity samples, as compared with the standard analysis protocol; it also substantially reduces sample bleeding for all samples. For comprehensiveness, we also discuss and compare experimental results from alternative approaches to sequencing low diversity samples. First, we discuss how the low diversity problem, if caused by barcodes, can be avoided altogether at the barcode design stage. Second and third, we present modified guidelines, which are more stringent than the manufacturer’s, for mixing low diversity samples with diverse samples and lowering cluster density, which in our experience consistently produces high quality data from low diversity samples. Fourth and fifth, we present rescue strategies that can be applied when sequencing results in low quality data and when there is no more biological material available. In such cases, we propose that the flowcell be re-hybridized and sequenced again using our Long Template Protocol. Alternatively, we discuss how analysis can be repeated from saved sequencing images using the Long Template Protocol to increase accuracy.

Highlights

  • Generation sequencing technology is rapidly developing and has become one of the most popular and crucial techniques used today to answer key biomedical questions

  • As we explained in the introduction, the problem with sequencing low diversity data on the Illumina platform is caused by the fixed template based on first four sequencing cycles that does not take into account possible low diversity sequence at the beginning of the read

  • Due to the non-open source nature of the Illumina software, not all modifications are possible, but we have developed several simple-to-implement modifications (Long Template Protocol) that have strong positive impact on the quality of the sequencing of low diversity data on the Illumina platforms

Read more

Summary

Introduction

Generation sequencing technology is rapidly developing and has become one of the most popular and crucial techniques used today to answer key biomedical questions. [5]) result in sequencing libraries with low sequence diversity in the initial bases of the sequenced reads. The standard Illumina data analysis protocol uses only images corresponding to the first four positions in the reads to determine the coordinates of different clusters on the flowcell, which is a key step in sequencing image analysis. Sequencing libraries with low sequence diversity in the initial four positions leads to sequencing images that pose a considerable challenge to the image recognition algorithm and usually results in low quality data when using the Illumina platform. The same software issue that lowers quality of data originating from low initial sequence diversity samples is a major source of sequencing errors in normal samples

Objectives
Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call