Abstract

Microsatellite polymorphism has always been a challenge for genome assembly and sequence alignment due to sequencing errors, short read lengths, and high incidence of polymerase slippage in microsatellite regions. Despite the information they carry being very valuable, microsatellite variations have not gained enough attention to be a routine step in genome sequence analysis pipelines. After the completion of the 1000 Genomes Project, which aimed to establish the most detailed genetic variation catalog for humans, the consortium released only two microsatellite prediction sets generated by two tools. Many other large research efforts have failed to shed light on microsatellite variations. We evaluated the performance of three different local assembly methods on three different experimental settings, focusing on genotype-based performance, coverage impact, and preprocessing including flanking regions. All these experiments supported our initial expectations on assembly. We also demonstrate that overlap-layout-consensus (OLC)-basedassembly methods show higher sensitivity to microsatellite variant calling when compared to a de Bruijn graph-based approach. We conclude that assembly with OLC is the better method for genotyping microsatellites. Our pipeline is available at https://github.com/gulfemd/STRAssembly.

Highlights

  • One of the primary aims of genomics studies is to characterize genetic variations and associate them with phenotypes including genetic diseases

  • In this paper we addressed the problem of characterizing microsatellites, important sources of genetic variation that are not fully addressed in large scale genome projects

  • To help improve microsatellite polymorphism discovery with short read data we proposed an end-to-end solution for using local assembly, and compared it against a mappingbased solution

Read more

Summary

Introduction

One of the primary aims of genomics studies is to characterize genetic variations and associate them with phenotypes including genetic diseases. Genome-wide association analyses have already identified thousands of genetic loci linked with human phenotypes, diseases, complex traits, and disorders. While many different types of genetic variations such as single-nucleotide polymorphisms (SNPs), copy number variation (CNV), and structural variation (SV) have been identified by these studies, microsatellite polymorphism remains largely understudied (Gymrek et al, 2016). The 1000 Genomes Project (The 1000 Genomes Project Consortium, 2015), which aimed to establish the most detailed genetic variation catalog for humans, analyzed 2504 individuals from 26 populations and only reported SNPs, indels, and a limited number of types of structural variation (i.e. deletions, small inversions, mobile element insertions, and tandem duplications) in detail. The 1000 Genomes Project and other large research efforts had limited effect on shedding light on microsatellite polymorphism

Objectives
Methods
Results
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call