Abstract

Sequencing reads overlapping polymorphic sites in diploid mammalian genomes may beassigned to one allele or the other. This holds the potential to detect gene expression,chromatin modifications, DNA methylation or nuclear interactions in an allele-specific fashion.SNPsplit is an allele-specific alignment sorter designed to read files in SAM/BAM formatand determine the allelic origin of reads or read-pairs that cover known single nucleotidepolymorphic (SNP) positions. For this to work libraries must have been aligned to a genomein which all known SNP positions were masked with the ambiguity base 'N' and aligned usinga suitable mapping program such as Bowtie2, TopHat, STAR, HISAT2, HiCUP or Bismark. SNPsplit also provides an automated solution to generate N-masked reference genomes forhybrid mouse strains based on the variant call information provided by the Mouse GenomesProject. The unique ability of SNPsplit to work with various different kinds of sequencing dataincluding RNA-Seq, ChIP-Seq, Bisulfite-Seq or Hi-C opens new avenues for the integrativeexploration of allele-specific data.

Highlights

  • Most functional NGS studies performed today still ignore the fact that many model organisms are diploid, and work on the averaged signal from the two alleles

  • Allele-specific analysis of next-generation sequencing reads is becoming an important tool to identify events such as allele-specific expression of genes (ASE), allele-specific binding of transcription factors or histones (ASB) or allele-specific methylation (ASM)

  • The simplest is to align all reads to a single reference genome, but this introduces a bias as reads from the allele which is more similar to the reference are able to map more efficiently[3]. Another approach involves the generation of two personalised genomes by incorporating known single nucleotide polymorphic (SNP) positions followed by an alignment to both genomes and a post-processing step to compute the union of the separate alignments

Read more

Summary

Introduction

Most functional NGS studies performed today still ignore the fact that many model organisms are diploid, and work on the averaged signal from the two alleles. The simplest is to align all reads to a single reference genome, but this introduces a bias as reads from the allele which is more similar to the reference are able to map more efficiently[3] Another approach involves the generation of two personalised genomes by incorporating known SNP positions (and possibly InDels) followed by an alignment to both genomes and a post-processing step to compute the union of the separate alignments (used in different flavours in 4–6). This approach is slower as it requires two separate mapping steps, and can still result in allelic bias because reads from one allele might not map uniquely or to an incorrect location in one of the genomes[3]. While a similar allele-specific functionality has been integrated into specialised applications, e.g. HiC-Pro[10], the unique capability to work with several different data types renders SNPsplit an ideal choice for correlation studies using allele-specific sequencing reads

Methods
Determine equivalent genomic position
Software license
28. Krueger F: SNPsplit
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.