Abstract

The rise of Next Generation Sequencing (NGS) technologies has transformed de novo genome sequencing into an accessible research tool, but obtaining high quality eukaryotic genome assemblies remains a challenge, mostly due to the abundance of repetitive elements. These also make it difficult to study nucleotide polymorphism in repetitive regions, including certain types of structural variations. One solution proposed for resolving such regions is Sequence Assembly aided by Mutagenesis (SAM), which relies on the fact that introducing enough random mutations breaks the repetitive structure, making assembly possible. Sequencing many different mutated copies permits the sequence of the repetitive region to be inferred by consensus methods. However, this approach relies on molecular cloning in order to isolate and amplify individual mutant copies, making it hard to scale-up the approach for use in conjunction with high-throughput sequencing technologies. To address this problem, we propose NG-SAM, a modified version of the SAM protocol that relies on PCR and dilution steps only, coupled to a NGS workflow. NG-SAM therefore has the potential to be scaled-up, e.g. using emerging microfluidics technologies. We built a realistic simulation pipeline to study the feasibility of NG-SAM, and our results suggest that under appropriate experimental conditions the approach might be successfully put into practice. Moreover, our simulations suggest that NG-SAM is capable of reconstructing robustly a wide range of potential target sequences of varying lengths and repetitive structures.

Highlights

  • Thanks to the increased throughput provided by Generation Sequencing technologies [1], de novo genome sequencing and resequencing are widely accessible research tools, significantly contributing to the advancement of many fields of biology and with many important applications

  • One of the main causes of assembly difficulties is the structure of the eukaryotic genome itself, and more precisely the abundance of repetitive elements, which leads to fragmented assemblies or complex misassemblies depending on the approach taken by the assembler [2,3]

  • An NG-Sequence Assembly aided by Mutagenesis’’ (SAM) experiment has many parameters to be tuned, the most important being S0, the number of starting molecules, the dilution factors d1 and d2 – parameters that in combination with the PCR efficiencies determine the distribution of the number of mutant types in the sequenced mixture – and the number of mutations introduced by the mutagenic PCR (Figure 2; see Materials and Methods)

Read more

Summary

Introduction

Thanks to the increased throughput provided by Generation Sequencing technologies [1], de novo genome sequencing and resequencing are widely accessible research tools, significantly contributing to the advancement of many fields of biology and with many important applications. At least in the case of second generation technologies, the length of the obtained reads is below that provided by ‘‘traditional’’ Sanger sequencing. Read length is critical for obtaining high-quality genome assemblies, as longer reads are more likely to capture the context of repetitive units (see later). For this reason, genome assembly is more difficult when using NGS technologies [2,3] and so, despite the increase in sequencing throughput, obtaining ‘‘finished’’ assemblies of eukaryotic genomes remains a challenge that requires laborious experiments to resolve the problematic regions on a case-by case basis [4,5]. The red units can be considered to be completely identical, but this is not necessary for assembly problems to present themselves

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call