Abstract

The virulence and pathogenicity of bacterial pathogens are related to their adaptability to changing environments. One process enabling adaptation is based on minor changes in genome sequence, as small as a few base pairs, within segments of genome called simple sequence repeats (SSRs) that consist of multiple copies of a short sequence (from one to several nucleotides), repeated in series. SSRs are found in eukaryotes as well as prokaryotes, and length variation in them occurs at frequencies up to a million-fold higher than bacterial point mutations through the process of slipped strand mispairing (SSM) by DNA polymerase during replication. The characterization of SSR length by standard sequencing methods is complicated by the appearance of length variation introduced during the sequencing process that obscures the lower abundance repeat number variants in a population. Here we report a computational approach to correct for sequencing process-induced artifacts, validated for tetranucleotide repeats by use of synthetic constructs of fixed, known length. We apply this method to a laboratory culture of Histophilus somni, prepared from a single colony, and demonstrate that the culture consists of populations of distinct sequence phase and length variants at individual tetranucleotide SSR loci.

Highlights

  • The virulence and pathogenicity of bacterial pathogens are related to their adaptability to changing environments

  • Long-read SMRT sequencing in circular consensus sequence (CCS) mode holds promise to improve interpretation of sequence data with respect to simple sequence repeats (SSRs) variability, as it avoids amplification-based artifacts and corrects for sequencing error by repeatedly sequencing across both strands of the SSR to correct for errors that may occasionally occur in a single pass

  • A library consisting of a single 63 base (16 RU-1 bp, 63 bp) SSR (Fig. 1a–c) was sequenced and the CCS were mapped to the control sequence to derive a set of CCS region of interest (ROI)

Read more

Summary

Introduction

The virulence and pathogenicity of bacterial pathogens are related to their adaptability to changing environments. Genotyping of SSR microsatellites for genetic studies in mammalian genomes demonstrates the generation of variant amplification products (“stutter bands”), whether analyzed by gel-based sizing or by sequencing-by-synthesis (SBS) methods[4,5,6] This artifact may occur even when libraries are prepared without amplification because the sequencing platform itself uses PCR for generating clusters for sequencing. CCS sequencing has been previously applied to improve the characterizations of longer SSRs in mammalian genomes and identify association with disease[9,10,11,12,13] In general, these genetic studies were designed to determine the diploid genotype of the individual, and when minor numbers of sequences were observed with lengths other than the two main genotypes, they were ignored. The z-score threshold is the maximum allowable absolute value of the z-scores that a given CCS must possess to pass the filter

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call