Automated processing of NGS data from raw sequencing files to ready-to-use information tables for genome modeling

David Fournier,Martin Wieland,Robert Deelen,Susanne Gerber

doi:10.18547/gcb.2018.vol4.iss2.e100042

Abstract

Epigenetic features such as histone and DNA modifications are important mechanisms for the regulation of gene expression and for cell and tissue development. As a result, extensive efforts are currently undertaken using next-generation sequencing (NGS) to generate vast amounts of data regarding the epigenetic regulation of genomes. Several tools and frameworks for the processing of these NGS data have been developed in the last decade. Nevertheless, each user still bares the challenge to integrate all these tasks to perform the analysis. This procedure is not only tedious but also resource-intensive due to the putative large processing power involved. To automate, standardize and speed up the handling of NGS data, with focus on ChIP-seq data, we present a user-friendly pipeline that automatically processes a list of sequencing data files and returns a ready-to-use purified table for subsequent modelling or analysis attempts.

Highlights

Epigenetics is a strong component of living systems, that comprise all the mechanisms helping to convert a genotype of an organism into phenotypic traits [1]
Data generated with these methods have recently shown emerging evidence for a role of epigenetics in gene regulation, gene expression and pathologies such as cancer and neurodegenerative diseases [3,4,5]
Models derived from Hi-C data, have contributed to understanding the relationship between epigenetic features and the structure of the DNA [14]. As these methods usually require the acquisition of information by the description of "peaks", it would be useful to have an automated procedure to generate ready-to-use information tables from raw next-generation sequencing (NGS) data

Summary

INTRODUCTION

Epigenetics is a strong component of living systems, that comprise all the mechanisms helping to convert a genotype (sum of all genetic information) of an organism into phenotypic traits (for instance, color of hair or number of digits) [1]. Models derived from Hi-C data, have contributed to understanding the relationship between epigenetic features and the structure of the DNA [14] As these methods usually require the acquisition of information (in the definition of information theory [15]) by the description of "peaks" (a cluster of sequencing events associated to a genomic position or region that form one unit of information), it would be useful to have an automated procedure to generate ready-to-use information tables from raw NGS data. For each peak detected in any ChIP-seq experiment, the pipeline computes the status of the other samples at the exact same position via feature-to-feature comparisons such as correlations or clustering As a result, this table contains exhaustive epigenetic information located in the raw input sequencing files. The pipeline can be adapted to include additional types of sequencing data, for instance DNA methylation states or gene expression data

METHODS

Findings

CONCLUSION

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Automated processing of NGS data from raw sequencing files to ready-to-use information tables for genome modeling

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Genomics and Computational Biology

Lead the way for us

Journal: Genomics and Computational Biology	Publication Date: Mar 15, 2018
License type: CC BY 4.0

Similar Papers

Abstract 2280: A comprehensive sample tracking and data processing workflow for next generation sequencing
Chandra Sekhar Pedamallu ... Donald Jackson
Cancer Research | VOL. 81
Chandra Sekhar Pedamallu, et. al.Chandra Sekhar Pedamallu ... Donald Jackson
01 Jul 2021
Abstract 2280: A comprehensive sample tracking and data processing workflow for next generation sequencing
Chandra Sekhar Pedamallu ... Donald Jackson

KNIME4NGS: a comprehensive toolbox for next generation sequencing analysis.
Maximilian Hastreiter ... Tim Jeske
Bioinformatics (Oxford, England) | VOL. 33
Maximilian Hastreiter, et. al.Maximilian Hastreiter ... Tim Jeske
09 Jan 2017
Bioinformatics (Oxford, England) | VOL. 33

Towards standardization of the description and publication of next‐generation sequencing datasets of fungal communities
R Henrik Nilsson ... Kessy Abarenkov
New Phytologist | VOL. 191
R Henrik Nilsson, et. al.R Henrik Nilsson ... Kessy Abarenkov
09 May 2011
New Phytologist | VOL. 191

Bioinformatic Challenges in Clinical Diagnostic Application of Targeted Next Generation Sequencing: Experience from Pheochromocytoma.
Joakim Crona ... Martin K Walz
PloS one | VOL. 10
Joakim Crona, et. al.Joakim Crona ... Martin K Walz
31 Jul 2015
PloS one | VOL. 10

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Automated processing of NGS data from raw sequencing files to ready-to-use information tables for genome modeling

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Genomics and Computational Biology