Abstract

Epigenetic features such as histone and DNA modifications are important mechanisms for the regulation of gene expression and for cell and tissue development. As a result, extensive efforts are currently undertaken using next-generation sequencing (NGS) to generate vast amounts of data regarding the epigenetic regulation of genomes. Several tools and frameworks for the processing of these NGS data have been developed in the last decade. Nevertheless, each user still bares the challenge to integrate all these tasks to perform the analysis. This procedure is not only tedious but also resource-intensive due to the putative large processing power involved. To automate, standardize and speed up the handling of NGS data, with focus on ChIP-seq data, we present a user-friendly pipeline that automatically processes a list of sequencing data files and returns a ready-to-use purified table for subsequent modelling or analysis attempts.

Highlights

  • Epigenetics is a strong component of living systems, that comprise all the mechanisms helping to convert a genotype of an organism into phenotypic traits [1]

  • Data generated with these methods have recently shown emerging evidence for a role of epigenetics in gene regulation, gene expression and pathologies such as cancer and neurodegenerative diseases [3,4,5]

  • Models derived from Hi-C data, have contributed to understanding the relationship between epigenetic features and the structure of the DNA [14]. As these methods usually require the acquisition of information by the description of "peaks", it would be useful to have an automated procedure to generate ready-to-use information tables from raw next-generation sequencing (NGS) data

Read more

Summary

INTRODUCTION

Epigenetics is a strong component of living systems, that comprise all the mechanisms helping to convert a genotype (sum of all genetic information) of an organism into phenotypic traits (for instance, color of hair or number of digits) [1]. Models derived from Hi-C data, have contributed to understanding the relationship between epigenetic features and the structure of the DNA [14] As these methods usually require the acquisition of information (in the definition of information theory [15]) by the description of "peaks" (a cluster of sequencing events associated to a genomic position or region that form one unit of information), it would be useful to have an automated procedure to generate ready-to-use information tables from raw NGS data. For each peak detected in any ChIP-seq experiment, the pipeline computes the status of the other samples at the exact same position via feature-to-feature comparisons such as correlations or clustering As a result, this table contains exhaustive epigenetic information located in the raw input sequencing files. The pipeline can be adapted to include additional types of sequencing data, for instance DNA methylation states or gene expression data

METHODS
Findings
CONCLUSION
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call