Abstract

Detecting allelic biases from high-throughput sequencing data requires an approach that maximises sensitivity while minimizing false positives. Here, we present Allelome.PRO, an automated user-friendly bioinformatics pipeline, which uses high-throughput sequencing data from reciprocal crosses of two genetically distinct mouse strains to detect allele-specific expression and chromatin modifications. Allelome.PRO extends approaches used in previous studies that exclusively analyzed imprinted expression to give a complete picture of the ‘allelome’ by automatically categorising the allelic expression of all genes in a given cell type into imprinted, strain-biased, biallelic or non-informative. Allelome.PRO offers increased sensitivity to analyze lowly expressed transcripts, together with a robust false discovery rate empirically calculated from variation in the sequencing data. We used RNA-seq data from mouse embryonic fibroblasts from F1 reciprocal crosses to determine a biologically relevant allelic ratio cutoff, and define for the first time an entire allelome. Furthermore, we show that Allelome.PRO detects differential enrichment of H3K4me3 over promoters from ChIP-seq data validating the RNA-seq results. This approach can be easily extended to analyze histone marks of active enhancers, or transcription factor binding sites and therefore provides a powerful tool to identify candidate cis regulatory elements genome wide.

Highlights

  • Mammalian cells are diploid and contain two copies of every gene locus, one inherited from the male, and one from the female parent

  • The Allelome.PRO pipeline depends on data obtained from genetically distinct individuals or pooled samples from two strains and requires three files to be provided by the user in order to start the fully automated analysis (Figure 1A)

  • A file defining single nucleotide polymorphisms (SNPs) between the two strains is required in browser extensible data (BED4) format

Read more

Summary

Introduction

Mammalian cells are diploid and contain two copies of every gene locus, one inherited from the male, and one from the female parent. Mitochondrial genes, plus genes on the sex chromosomes in males, are the only exception to this rule. Since each diploid gene locus has the possibility to be expressed independently from either parental chromosome, different allelic states of expression can arise. Genes that deviate from biallelic expression by showing preferential expression of one of the two parental alleles are described as showing ‘monoallelic’ expression. Only a small subset of mammalian genes is known to show monoallelic expression. When either parental allele can show preferential expression, this is known as random monoallelic expression (RMAE). When one parental allele consistently and heritably shows preferential expression, this is known as parentalspecific or imprinted monoallelic expression (IMAE)

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call