Abstract

Next-generation sequencing can be costly and labour intensive. Usually, the sequencing cost per sample is reduced by pooling amplified DNA = amplicons) derived from different individuals on the same sequencing lane. Barcodes unique to each amplicon permit short-read sequences to be assigned appropriately. However, the cost of the library preparation increases with the number of barcodes used. We propose an alternative to barcoding: by using different known proportions of individually-derived amplicons in a pooled sample, each is characterised a priori by an expected depth of coverage. We have developed a Hidden Markov Model that uses these expected proportions to reconstruct the input sequences. We apply this method to pools of mitochondrial DNA amplicons extracted from kangaroo meat, genus Macropus. Our experiments indicate that the sequence coverage can be efficiently used to index the short-reads and that we can reassemble the input haplotypes when secondary factors impacting the coverage are controlled. We therefore demonstrate that, by combining our approach with standard barcoding, the cost of the library preparation is reduced to a third.

Highlights

  • Next-generation sequencing technologies that utilise many short fragments of DNA allow scientists to assemble the sequences of individual genes and genomes

  • The kangaroo species were unknown at the time of purchase but statistics from the Department of the Environment and Energy of Australia indicate that they must be of genus Macropus, and most likely Eastern grey kangaroo, Macropus giganteus, as it is the largest population with highest quota in New South Wales [15]

  • Our results demonstrate that individual haplotypes can be reconstructed when pooled in a single sample by using different known amounts of DNA that are carefully chosen

Read more

Summary

Introduction

Next-generation sequencing technologies that utilise many short fragments of DNA allow scientists to assemble the sequences of individual genes and genomes. Full genome sequencing approaches are becoming more feasible, targeted sequencing methods such as amplicon sequencing are still used as a cost-effective means of genotyping large numbers of individuals [1,2,3,4,5]. A standard approach to short-read sequencing is to pool amplicons from multiple individuals into a single sample that is subsequently sequenced. To identify the provenance of shortreads, this method requires a unique DNA sequence tag, or “barcode”, to be attached to the short fragments of DNA obtained from the amplicon(s) of each individual. The barcodes allow the sequences associated with a particular individual to be separated computationally.

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call